Please use this identifier to cite or link to this item: https://hdl.handle.net/10356/54851
Title: Statistical and data mining approach for the prediction of solar radiation
Authors: Wu, Ji
Keywords: DRNTU::Engineering::Computer science and engineering::Mathematics of computing
Issue Date: 2013
Source: Wu, J. (2013). Statistical and data mining approach for the prediction of solar radiation. Doctoral thesis, Nanyang Technological University, Singapore.
Abstract: This thesis was initially proposed as part of Singapore National Research Foundation-Competitive Research Program (NRF-CRP) project entitled: “Combined Cycle Solar Energy Self-sustaining Membrane Distillation (MD) and Membrane Distillation Bioreactor (MDBR) Water Production and Recycling System” in 2009. As the proposed bioreactor is very sensitive to temperature changes, it is crucial to keep its temperature stable. As most of the energy is provided by a set of solar panels, ability to predict the solar radiation is therefore crucial in maintaining a stable temperature. Hence, there is a need to develop a system for accurate and consistent prediction of solar radiation. We first present our research work on some fundamental issues encountered in solar radiation time series prediction. We examine how statistical models and data mining approaches can be used to conduct the prediction of solar time series. Our purpose is to find an approach which can model the complex nonlinear relationship lying in the time series data, effectively remove the noises and outliers and, on the other hand, provide us accurate and consistent prediction. Auto regressive integrate moving average model (ARIMA), is a widely studied time series prediction approach. It has a very strong foundation in statistics. It gains great popularity because it can be used to model different kinds of time series with properly identified order. Besides, there is a widely accepted methodology to develop a model for a specific time series. However, the disadvantage of ARIMA is its inability to model time series that are nonlinear. Time delay neural network (TDNN) which is developed based on artificial neural network (ANN), is also studied in this thesis. TDNN is capable to capture the nonlinear relationship in the data set. But just like other data driven algorithms, it has the over-fitting problem. And when there are lots of noise data or outliers in the training data set, TDNN may also yield gross mistake. As ARIMA and TDNN both have their own advantages, we propose a hybrid model which tries to combine them. This model uses ARIMA to model the linear component of time series and TDNN is used to model the nonlinear component. We also use a novel detrending method to generate stationary series for ARIMA rather than the traditional differencing method. As we use the solar radiation time series in our experiment, several meteorology models are used as detrending model. Experimental result shows that our proposed hybrid model outperforms either ARIMA or TDNN. To better improve the prediction performance of time series, we propose a novel multi-model framework, or MMF. In this framework, we assume that there are several different patterns occur repeatedly in the time series. Our purpose is to develop prediction model for every pattern and using proper model to predict the future value during the prediction phase. It is therefore necessary to segment the time series and then group the subsequences into different clusters. Initially we adopted a fixed length segmentation schema and find the optimal length for the subsequence through cross validation experiment. As TDNN is proved to be able to model nonlinear relationship of solar radiation, it is adopted as the prediction model. When predicting the future value of the time series, the pattern that the current time series belongs to is firstly identified. After that, the testing data is fed to the chosen model to conduct the prediction. The experimental result shows that the proposed MMF presents better prediction performance than other models. Next, we sought to improve the clustering. Genetic algorithm and multi model framework or GAMMF is developed by combining genetic algorithm with MMF. We use a dynamic segmentation schema in GAMMF instead of the fixed length segmentation in MMF. To find the optimal segmentation schema, genetic algorithm is used to combine with K-means clustering algorithm. This segmentation schema is supposed to be able to achieve better clustering performance. Then TDNN is used to model different patterns. Support vector regression (SVR) is also used along with TDNN and serves as an additional prediction model. When none of the pattern is appropriate to describe the current time series, the SVR model will be used to conduct the prediction. The experiment result proves that GAMMF outperforms other prediction algorithms in both accuracy and consistency.
URI: https://hdl.handle.net/10356/54851
DOI: 10.32657/10356/54851
Fulltext Permission: open
Fulltext Availability: With Fulltext
Appears in Collections:EEE Theses

Files in This Item:
File Description SizeFormat 
thesis draft revision_final.pdfPhD thesis2.42 MBAdobe PDFThumbnail
View/Open

Google ScholarTM

Check

Altmetric


Plumx

Items in DR-NTU are protected by copyright, with all rights reserved, unless otherwise indicated.