全球空氣汙染有持續上升的趨勢,研究指出PM2.5會影響身體的健康,因為PM2.5的顆粒非常微小會侵入人體造成傷害,因此預測PM2.5是非常重要的議題。透過資料探勘預測PM2.5的濃度達到預警效果,事先做預防可以減少對人體的傷害,並藉由數據控制PM2.5的濃度改善環境汙染的現況。
本研究使用板橋監測站2015年的監測資料進行實驗,實驗採用資料探勘技術的方式預測PM2.5濃度,並同時使用線性、類神經和支援向量機3種方法,首先建立時間序列預測模型,再根據逐步迴歸法做屬性選擇的動作,選出影響PM2.5的氣象因子以建立相關氣象因子的預測模型。最後利用時間序列和相關氣象因子兩個預測模型得到的預測值,帶入我們提出的整合線性加權法,依據加權的方式,可以改善兩個預測模型的預測值。
本研究結果發現,時間序列預測模型較適合使用每小時PM2.5濃度的監測資料做預測。相關氣象因子預測模型則較適合使用每天PM2.5濃度的監測資料做預測。從我們提出的整合線性加權法得到一個整合的結果,發現使用線性和支援向量的方法預測出的值整合線性加權法,當α值為0.7的時候改善效果可以達到最好。
Global air pollution is getting worse. Studies have shown that PM2.5 can have a sig-nificant impact on human health due to its tiny size, so how to predict the PM2.5 con-cen-trations is an important issue in control and reduction of pollutions in the air. Data mining techniques have been largely used in predictive analytics on various applications. In this study, we proposed an integrated model to predict PM2.5 concentrations based on time series analysis and several classification models.
We used the data from the Banqiao monitoring station of Taiwan in 2015 as the basis for building the prediction models. Firstly, we used stepwise regression to identify the major factors that influenced the PM2.5 concentrations. Those factors were then used to build the prediction models based on three classification methods including linear, neural network and support vector machines. Finally the predations from the time series model and the classification models were integrated using a linear weighting method.
Experimental results found that the time series prediction model was more suitable for predicting hourly data of PM2.5 concentration. However, the classification model was more suitable for predicting daily data of PM2.5 concentration. The proposed integrated linear weighting model performed the best when the linear and the support vector ma-chine methods were used and had a weighting score of 0.7.