发明名称 Scalable system for K-means clustering of large databases
摘要 In one exemplary embodiment the invention provides a data mining system for use in evaluating data in a database. Before the data evaulation begins a choice is made of a cluster number K for use in categorizing the data in the database into K different clusters and initial guesses at the means, or centriods, of each cluster are provided. Then a portion of the data in the database is read from a storage medium and brought into a rapid access memory. Data contained in the data portion is used to update the original guesses at the centroids of each of the K clusters. Some of the data belonging to a cluster is summarized or compressed and stored as a summarization of the data. More data is accessed from the database and assigned to a cluster. An updated mean for the clusters is determined from the summarized data and the newly acquired data. A stopping criteria is evaluated to determine if further data should be accessed from the database. If further data is needed to characterize the clusters, more data is gathered from the database and used in combination with already compressed data until the stopping criteria has been met.
申请公布号 US6012058(A) 申请公布日期 2000.01.04
申请号 US19980042540 申请日期 1998.03.17
申请人 MICROSOFT CORPORATION 发明人 FAYYAD, USAMA;BRADLEY, PAUL S.;REINA, CORY
分类号 G06F17/30;(IPC1-7):G06F17/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址