摘要 |
A generalization of frequent item sets to error-tolerant frequent item sets (ETF) is disclosed, together with its application in data clustering using error-tolerant frequent item sets to either build clusters or as an initialization technique for standard clustering algorithms. Efficient feasible computational algorithms for computing ETF's from very large databases is presented. In one embodiment, a method determines a plurality of weak ETF's, which are strongly tolerant of errors, and determines a plurality of strong ETF's therefrom, which are less tolerant of errors. The resulting clusters can be used as an initial model for a standard clustering approach, or may themselves be used as the end clusters. In one embodiment, the data covered by the strong clusters is removed from the data, and the process is repeated, until no more weak clusters can be found. Te invention includes methods for constructing ETF's from more general data types: data sets that include categorical discrete, continuous, and binary attributes.
|