发明名称 Data clustering using error-tolerant frequent item sets
摘要 A generalization of frequent item sets to error-tolerant frequent item sets (ETF) is disclosed, together with its application in data clustering using error-tolerant frequent item sets to either build clusters or as an initialization technique for standard clustering algorithms. Efficient feasible computational algorithms for computing ETF's from very large databases is presented. In one embodiment, a method determines a plurality of weak ETF's, which are strongly tolerant of errors, and determines a plurality of strong ETF's therefrom, which are less tolerant of errors. The resulting clusters can be used as an initial model for a standard clustering approach, or may themselves be used as the end clusters. In one embodiment, the data covered by the strong clusters is removed from the data, and the process is repeated, until no more weak clusters can be found. Te invention includes methods for constructing ETF's from more general data types: data sets that include categorical discrete, continuous, and binary attributes.
申请公布号 US6567936(B1) 申请公布日期 2003.05.20
申请号 US20000500173 申请日期 2000.02.08
申请人 MICROSOFT CORPORATION 发明人 YANG CHENG;FAYYAD USAMA M.;BRADLEY PAUL S.
分类号 G06F11/00;G06F17/30;(IPC1-7):G06F11/00 主分类号 G06F11/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利