发明名称 METHOD AND SYSTEM FOR SIMILARITY SEARCH AND CLUSTERING
摘要 Provided is a similarity search method that makes use of a localized distanc e metric. The data includes a collection of items, wherein each item is associated with a set of properties. The distance between two items is defin ed in terms of the number of items in the collection that are associated with t he set of properties common to the two items. A query is generally composed of a set of properties. The distance between a query and an item is defined in terms of the number of items in the collection that are associated with the set of properties common to the query and the item. The properties can be of various types, such as binary, partially ordered, or numeric. The distance metric may be applied explicitly or implicitly for similarity search. One embodiment of this invention uses random walks such that the similarity sear ch can be performed exactly or approximately, trading-off between accuracy and performance. The distance metric of the present invention can also be the basis for matching and clustering applications. In these contexts, the distance metric of the present invention may be used to build a graph, to which matching or clustering algorithms can be applied.
申请公布号 CA2470899(A1) 申请公布日期 2003.07.03
申请号 CA20022470899 申请日期 2002.08.09
申请人 ENDECA TECHNOLOGIES, INC. 发明人 TUNKELANG, DANIEL
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利