摘要: | 本論文探討在在不同的模式複雜度之下對估計高分子溶解度參數準確度的影響。研究中使用了一組六個參數的描述因子,包括:氫鍵或靜電引力強度hb、是否屬於聚烯烴類alk 、內能Eint、負電荷分佈Qii、分子內的氫原子最大正電荷QH鍵和高分子重複單元內的氮原子數目nN來估計97個聚合物的溶解度參數(δ),並且分別以多元線性迴歸分析與遺傳規劃法來建造溶解度參數的估計模式。在本研究中透過調整族群代數、族群大小、模型複雜度、基因樹深度和基因組數量等參數來獲得效能更好的遺傳規劃法估計模式。
我們使用了遺傳規劃法(Genetic Programming, GP)來預測溶解度參數,並且使用了GPTIPS[1]為軟體應用平台,它可用於符號式的資料探勘。GPTIPS所建立的模式是一多基因符號式的非線性迴歸方程式。經過訓練組的訓練與驗證組的驗證測試,為了能讓模型達到穠纖合度,因為有共線性的問題,所以使用逐步迴歸法挑選共線度,並分別使用六個變數並分成四組,接著縮小每組變數並觀察準確率以及模式複雜度。第一組所得到的均方根誤差訓練組1.156(R2=0.8816)與驗證組0.3232(R2=0.9643),第二組所得到的均方根誤差訓練組1.2198(R2=0.8696)與驗證組0.4376(R2=0.9384) ,第三組所得到的均方根誤差訓練組1.4913(R2=0.8060)與驗證組0.6358(R2=0.8636),第四組參所得到的均方根誤差訓練組1.5451(R2=0.7930)與驗證組0.7971( R2=0.7828)的結果。從測試的結果顯示與其他現存的估計模式比較,以遺傳規劃法估計高分子的溶解度參數可以得到非常準確的估計值並可優化模式的複雜度。經與其它相關文獻比較,本論文所提的遺傳規劃法估計高分子溶解度參數的表現優異[13],並且具有人類瞭解的符號結構表示與簡單又容易使用的優點。
In this article, a set of six-parameter descriptors, including hydrogen bond or electrostatic attraction strength hb, whether it belongs to polyolefin alk, internal energy Eint, negative charge distribution Qii, maximum positive charge of hydrogen atom in the molecule bond QH and the number of nitrogen atoms in the polymer repeating unit nN, were used to correlate with solubility parameters for polymers. Multiple linear regression analysis and genetic programming were used to generate the models. The genetic programming was implemented under the software application platform GPTIPS[1], which can be used for symbolic data mining. To reduce the redundancy of the models, the six-parameter descriptors were reselected into four sets according to their contribution for the regression. The parameters of GPTIPS, such as population algebra, population size, model complexity, depth of gene tree, and number of genomes, were adjusted to obtain a more efficient genetic programming estimation model while considering the model complexity.
The final optimum genetic programing-based models produced the training set root mean square errors(RMSEs) of 1.156(R2=0.8816, for set1), 1.2198(R2=0.8696, for set2), 1.4913(R2=0.8060, for set3) and 1.5451(R2=0.7930, for set4); and the validation set root mean square errors(RMSEs) of 0.3232(R2=0.9643, for set1), 0.4376(R2=0.9384, for set2), 0.6358(R2=0.8636, for set3) and 0.6358(R2=0.8636, for set4), respectively. It suggests that the models obtained here can predict the solubility parameters values of polymers and provide theoretical guidance for polymeric molecular designs. Haven the advantages of the symbolic models and they are easy to implement[13], the proposed models are accurate in the estimation of solubility parameter values for polymers. |