发明名称 Systems and methods for cluster comparison
摘要 Systems and methods for measuring similarity between a first set of clusters and a second set of clusters apply a first clustering procedure and a second clustering procedure to a set of objects to cluster the objects into a first set of clusters and a second set of clusters, respectively, calculate a similarity index between the first set of clusters and the second set of clusters, calculate an expected value of the similarity index, wherein the expected value is a value of the similarity index one would expect to obtain, on average, between a randomly generated third set of clusters and a randomly generated fourth set of clusters with a same number of clusters as the first set of clusters and the second set of clusters, respectively, and adjust the calculated similarity index based on the expected value of the similarity index.
申请公布号 US9026536(B2) 申请公布日期 2015.05.05
申请号 US201113879002 申请日期 2011.10.17
申请人 Canon Kabushiki Kaisha 发明人 Denney Bradley;Korattikara-Balan Anoop
分类号 G06F7/00;G06F17/30;G06K9/00 主分类号 G06F7/00
代理机构 Canon U.S.A., Inc. IP Division 代理人 Canon U.S.A., Inc. IP Division
主权项 1. A method for measuring similarity between a first set of clusters generated by a first clustering procedure and a second set of clusters generated by a second clustering procedure, wherein the clustering procedures are for grouping a set of objects, the method comprising: applying a first clustering procedure and a second clustering procedure to a set of objects to cluster the objects into a first set of clusters and into a second set of clusters, respectively, wherein applying the first clustering procedure and the second clustering procedure comprises extracting object features from each object in the set of objects,determining one or more comparison measures by which to compare respective features of the objects in the set of objects,comparing the respective features of the objects in the set of objects based on the one or more comparison measures to determine differences between the respective features of the objects,outputting a group of measures representing the differences between the respective features of the objects, andclustering the objects into the first set of clusters and into the second set of clusters based at least in part on the group of measures; calculating a similarity index between the first set of clusters and the second set of clusters; calculating an expected value of the similarity index, wherein the expected value is a value of the similarity index one would expect to obtain, on average, between a randomly generated third set of clusters and a randomly generated fourth set of clusters with a same number of clusters as the first set of clusters and the second set of clusters, respectively; and adjusting the calculated similarity index by a penalty factor that includes the expected value of the similarity index.
地址 Tokyo JP