摘要 |
<p>The present invention relates to a system and method for analyzing the cluster results of large amounts of data. The method uses an open source MapReduce framework called Hadoop in order to calculate silhouette coefficients, which are significance test indexes capable of evaluating the cluster results of large amounts of data. In order to implement same, clustered data are divided into blocks, and input splits are created for all of the blocks. Also, the created input splits are allocated to a plurality of computers, and each of the computers stores the data of the blocks included in the input splits to a memory to calculate silhouette coefficients for each record and provides the calculated silhouette coefficients to a characteristic coefficient calculator to obtain silhouette coefficients for clusters. Thus, cluster results of large amounts of data are effectively analyzed quickly and independently.</p> |