发明名称 METHOD AND APPARATUS FOR SCALABLE PROBABILISTIC CLUSTERING USING DECISION TREES
摘要 Some embodiments of the invention include methods for identifying clusters in a database, data warehouse or data mart. The identified clusters can be meaningfully understood by a list of the attributes and corresponding values for each of the clusters. Some embodiments of the invention include a method for scalable probabilistic clustering using a decision tree. Some embodiments of the invention, perform linearly in the size of the set of data and only require a single access to the set of data. Some embodiments of the invention produce interpretable clusters that can be described in terms of a set of attributes and attribute values for that set of attributes. In some embodiments, the cluster can be interpreted by reading the attribute values and attributes on the path from the root node of the decision tree to the node of the decision tree corresponding to the cluster. In some embodiments, it is not necessary for there to be a domain specific distance function for the attributes. In some embodiments, a cluster is determined by identifying an attribute with the highest influence on the distribution of the other attributes. Each of the values assumed by the identified attribute corresponds to a cluster, and a node in the decision tree. In some embodiments, the CUBE operation is used to access the set of data a single time and the result is used to computer the influence and other calculations.
申请公布号 WO0067194(A3) 申请公布日期 2001.08.02
申请号 WO2000US11626 申请日期 2000.04.28
申请人 E.PIPHANY, INC.;SAHAMI, MEHRAN;JOHN, GEORGE, H. 发明人 SAHAMI, MEHRAN;JOHN, GEORGE, H.
分类号 G06F17/30;G06K9/62;(IPC1-7):G06F17/30;G06F17/60 主分类号 G06F17/30
代理机构 代理人
主权项
地址