面對日益複雜的進階持續性滲透攻擊(Advanced Persistent Threat),惡意軟體 分類為數位鑑識中最重要的一環。正確的惡意軟體分類可以得到惡意軟體最完整的系 統行為,並且簡化鑑識之分析工作。傳統的惡意軟體分類著重於執行後之動態分析或 者是以逆向工程結合靜態分析的方式,試圖取得惡意軟體的系統行為資訊,但惡意軟 體會透過反虛擬機器監控和混淆技術來降低分類的正確率。 隨著誘捕系統愈來愈健全,誘捕系統所蒐集到的惡意軟體原始碼也日漸增加,藉 由分析惡意軟體的原始碼可以得到最正確的惡意軟體分類,因此本計畫提出一個自動 化惡意軟體分類機制。本計畫藉由誘捕系統所擷取之惡意軟體原始碼,利用惡意軟體 檔案結構相似度以及原始碼檔案相似度,透過階層式分群演算法(Hierarchical Clustering Algorithm)之方法,不但可以正確的將新捕捉到的惡意軟體分類到正確的 類別也可以快速地找出新類型的惡意軟體。本計畫提出的方式可以大幅度減少數位鑑 識者針對同一類型的惡意軟體重複進行高成本的分析,亦可在最短時間內了解攻擊者 行為以及意圖。本計畫所提系統除了可以將惡意軟體原始碼做正確的分類外,亦可應 用於其他有原始碼分類需求的領域。 In the face of APT (Advanced Persistent Threat), malware classification is one of the promising solutions in the field of digital forensics. In previous literature, researchers performed dynamic analysis or static analysis after reverse engineering. In the other hand, malware developers even use anti-VM and obfuscation techniques try to evade malware classifiers. Honeypots are increasingly deployed throughout different networks; malware source code is collected and unclassified. Source code analysis provides a better classification for forensics. In this project, a novel classification approach is proposed, based on logic similarity and directory structure similarity. Hierarchical clustering algorithm finds the best fit classification for each testing data and creates one if none fits well. New type of malware could be identified and then analyzed further. Such classification avoids to re-analyze known malware and allocates resources for new malware. The proposed system could not only to apply to malware analysis but also normal source code analysis.