癌症已經連續多年高居國人十大死因的首位,但一般癌症發
現時期均屬癌症末期,此時外科手術已經到了很難治癒的階段。
早期發現輕微徵兆的癌症,傳統病理學家可以依腫瘤的形態學加
以分類,但這種分類方法對於某些組織晚期其病理形態相類似,
對於癌症的病程和預後已經是各自迥異,無法加以有效的區分。
迄今,分子診斷所利用的基因晶片,可以對數以萬計的基因在不
同組織表現去進行跟蹤監測,有助於腫瘤組織的鑑別分類和新病
型的發現。本研究利用資料探勘技術來說明在腫瘤上的處理步
驟,資料獲得後所經過的資料預處理,再做基因排序特徵減縮,
接著以當前性能較好的四種分類方法: 如KNN 法、支持向量機
法、貝氏分類法、以及冗餘度分析法,對它們進行了集成學習分
類法研究,實驗結果顯示集成分類法是當前分類結果較為穩定,
準確度較高及性能較好的ㄧ種方法。
Cancer has been ranked on the top one cause of death for many years. Currently
most cases of cancer are not discovered until the latter stages because it usually causes
no symptoms. Surgery is especially important in the early stages of cancer development
but not the final stage. The traditional tumor morphology is classified by pathologist on
the approach that is for some organisms in the early stages of cancer development.
However, this classification method cannot effectively discriminate terminal cancer that
has similar pathological appearance but different course and prognosis. Today, gene
chips, an important means of molecular diagnosis, can be used to track and monitor
large data sets covering the expression of thousands of genes over a wide range of different
organisms’ tissues/samples. It can not only facilitate in tumor classification but
also discover in new disease identification. Applying data mining techniques for Leukemia
classification from gene expression data through a serial of process are
pre-processing, feature reduction and gene ranking.
Four classification methods-KNN, SVM, Naïve Bayes’ and RDA, were implemented
and compared in this research. Experimental result of this research indicates that
Ensemble Approach create a better and stable performance in term of prediction precision.