應用詞頻以改良多元貝氏定理於文件分類之研究

文化大學機構典藏 CCUR > 商學院 > 資訊管理學系暨資訊管理研究所 > 博碩士論文 > Item 987654321/23850

jsp.display-item.identifier=請使用永久網址來引用或連結此文件: https://irlib.pccu.edu.tw/handle/987654321/23850

题名:	應用詞頻以改良多元貝氏定理於文件分類之研究
作者:	羅仁君
贡献者:	資訊管理學系
关键词:	多元貝氏定理 multimembership Bayesian 文件分類 document classification 文件自動分類 automatic document classification 基因演算法 genetic algorithm
日期:	2011
上传时间:	2012-12-04 10:30:30 (UTC+8)
摘要:	多元貝氏定理(multimembership Bayesian，簡稱MMB)近幾年曾運用於醫療、網站、郵件和文件的自動分類與推論，到目前MMB的在知識分類推論領域上的相關研究一直都持續進行著。而本研究是針對MMB文件自動分類提出以基因演算法為概念的改良方法，使用動態產生篩選門檻來達到真正完全的自動分類。以基因演算法為概念的MMB改良方式「自動調適篩選門檻」來提取重要字詞進行MMB分類，最後也以「自動評估」取得最佳結果。經實驗發現，當類別彼此差異度大時，最佳分類準確率為83.93%；類別彼此差異度小時，其分類準確率為70.60%。 In recent years, multimembership Bayesian (MMB) has had a wide application for medical, website, E-mail and other document processing use the practices list above utilize the automatic classification and knowledge inference function of MMB to im-prove efficiency. Given MMB’s practicality and popularity across all walks of lives, the research around MMB remains a constant focus academically. To further improve the strength of MMB’s core automatic document classification function, our study proposes the additional application of genetic algorithm before traditional MMB. Based on the law of probability, the extra step of genetic algorithm helps develop the "automatic adaptable screening threshold" mathematically, thus with more accuracy. Such calculation pin-points the significant, frequently-used words to form the threshold for further MMB classification. Since the application of genetic algorithm acts as the in-itial screen, consequently, the extracted leftovers are more precise for any document classification. Further, based on the mathematic results, the threshold leads to automatic assessment, which selects the most desirable word choice automatically. The research presents significantly improved results. When class differences are relatively in large degree, its classification accuracy achieves 83.93%. Even when class differences are to a lesser extent, its classification accuracy rate of is able to reach 70.60%.
显示于类别:	[資訊管理學系暨資訊管理研究所 ] 博碩士論文

文件中的档案:

档案	大小	格式	浏览次数
http___thesis.lib.pccu.edu.tw_cgi-bin_cdrfb3_gsweb1.pdf	149Kb	Adobe PDF	192	检视/开启
http___thesis.lib.pccu.edu.tw_cgi-bin_cdrfb3_gsweb2.pdf	683Kb	Adobe PDF	379	检视/开启
http___thesis.lib.pccu.edu.tw_cgi-bin_cdrfb3_gsweb3.pdf	6271Kb	Adobe PDF	889	检视/开启

检视Licence

在CCUR中所有的数据项都受到原著作权保护.

数据加载中.....