分類技術是資料探勘領域中一項重要且基本的分析工具,其主要目的是將空間資料區分為多個群集或類別,使得相同的群集內具有高度的相似性,而不同的群集內則具有高度的異質性。K-means 分類法是一個大家熟知的非階層演算法,該法適用於具有球狀分佈的資料。共用臨域分類法是一種非監督式演算法,該法是以密度為基礎的演算法,可以有效地處理具有非球狀群集的問題。但是共用臨域法並無疊代設計,亦即一旦資料被指定分群後,即無法將重新分類。因此本研究提出了結合共用鄰域法與 K-means 分類法之新的演算方法,同時結合兩者之優點,期能有效地改良分類技術。
Classification technique is an important and basic analysis tool of data mining. The major purpose of cluster analysis is to partition a database to several clusters, so that data in the same cluster are similar and homogeneous, but data in the different cluster are dissimilar and heterogeneous. K-means clustering is the most well-known hierarchical clustering methods, which is suitable for the dataset with the globular distribution. Shared near neighbors clustering is one of the methods of unsupervised classification, which is based on density-based algorithm. It can effectively deal with the problem of non-globular distribution. However, it does not share the design of iterative algorithm, which cannot be reclassified after each data is classed as some groups. In this study, we propose a new classification technique in order to deal with the dataset which both with globular and non-globular distribution. By combining the advantages about shared near neighbors and K-means clustering, we will to prove the accuracy and efficiency of this new classification approach.