发明名称 Approximate K-Means via Cluster Closures
摘要 A set of data points is divided into a plurality of subsets of data points. A set of cluster closures is generated based at least in part on the subset of data points. Each cluster closure envelopes a corresponding cluster of a set of clusters and is comprised of data points of the enveloped cluster and data points neighboring the enveloped cluster. A k-Means approximator iteratively assigns data points to a cluster of the set of clusters and updates a set of cluster centroids corresponding to the set of clusters. The k-Means approximator assigns data points based at least in part on the set of cluster closures.
申请公布号 US2014258295(A1) 申请公布日期 2014.09.11
申请号 US201313791666 申请日期 2013.03.08
申请人 MICROSOFT CORPORATION 发明人 Wang Jingdong;Ke Qifa;Li Shipeng;Wang Jing
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method implemented at least partially by a processor, the method comprising: constructing a codebook of a set of data points, wherein the codebook is comprised of a set of cluster centroids, C={c1, c2, c3, . . . ck}, of a corresponding set of clusters, G={G1, G2, G3, . . . Gk}, wherein each cluster is comprised by a subset of data points, by; identifying neighboring data points for each data point of the set of data points; and clustering the data points of the set of data points by iteratively: constructing a cluster closure, Gj, for each cluster, Gj, of the set of clusters, G={G1, G2, G3, . . . Gk}, based at least on the neighboring data points, wherein a set of cluster closures G={ G1, G2, G3, . . . Gk} corresponds to the set of clusters, G={G1, G2, G3, . . . Gk},assigning a given data point to a given cluster based at least in part on a subset of the set of cluster closures, andfor each cluster of the set of clusters, updating data points comprising a cluster and calculating an updated cluster centroid for the cluster based at least in part on the updated data points.
地址 Redmond WA US