摘要 |
<p><P>PROBLEM TO BE SOLVED: To detect fuzzy duplicates, and eliminate such duplicates in at least one implementation. <P>SOLUTION: Fuzzy duplicates are multiple, seemingly distinct tuples (i.e., records) in a database that represent the same real-world entity or phenomenon. A solution to a fuzzy duplicate elimination problem is scale invariant such that a scale of a distance function does impact local structural properties of the tuples. It is split/merge consistent in that shrinking distances between tuples in a group of duplicates, and expanding distances between tuples across groups may only change a partition in limited ways. It has a constrained richness such that a range of a duplicate elimination function allows all groupings that would be useful in practice. <P>COPYRIGHT: (C)2006,JPO&NCIPI</p> |