发明名称 SYSTEM AND METHOD FOR ANALYZING RESULT OF CLUSTERING MASSIVE DATA
摘要 Disclosed are a system and a method for analyzing a result of clustering massive data. An open-source map/reduce framework named Hadoop is used to calculate a silhouette coefficient corresponding to a significance verification index capable of evaluating a result of clustering massive data. To implement the system and the method for analyzing a result of clustering massive data, clustered data is divided into blocks. For all of the blocks, input splits are generated. Then, the generated input splits are assigned to multiple computers. Each computer stores only data of blocks included in an input split assigned in a memory, and calculates a silhouette coefficient for each record. Each computer provides only the calculated silhouette coefficient to an index coefficient calculation apparatus, and enables the index coefficient calculation apparatus to calculate a silhouette coefficient for a cluster. Therefore, the result of clustering the massive data can be rapidly and objectively analyzed.
申请公布号 US2015032759(A1) 申请公布日期 2015.01.29
申请号 US201214009907 申请日期 2012.10.31
申请人 SK Planet Co., Ltd. 发明人 Lee Chae Hyun;Kim Min Soeng;Lee Jun Sup
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A system for analyzing a result of clustering massive data, the system comprising: a task management apparatus configured to divide a clustered target file into blocks of a pre-designated size, and generate an input split corresponding to a task pair for a reduce task for reducing input data by combining the divided blocks; at least one distance calculation apparatus configured to receive allocation of the input split, and calculate a distance sum for each record between blocks included in the input split; at least one index coefficient calculation apparatus configured to calculate a clustering significance verification index coefficient for each record by using the distance sum for each record received from the at least one distance calculation apparatus; and an analysis apparatus configured to calculate a final significance verification index coefficient of a corresponding cluster, by averaging the clustering significance verification index coefficient for each record.
地址 Seoul KR