发明名称 SEARCH ENGINE USING NAME CLUSTERING
摘要 A system maintains a plurality of names. The system generates cluster ids based on the names, and forms first clusters by grouping names having an equivalent cluster id. Then, for each cluster, and for each unique name in each cluster, the system keeps the unique name in the cluster when the unique name is similar to each other unique name in the cluster. The system can also receive a name entered by a user. The system generates a cluster id for the name entered by the user. The system retrieves a cluster having an equivalent cluster id as the cluster id of the name entered by the user. The system forms a construct that includes the name entered by the user and unique names in the retrieved cluster. The system searches for names within a population using the construct as search criteria.
申请公布号 US2016019284(A1) 申请公布日期 2016.01.21
申请号 US201414335190 申请日期 2014.07.18
申请人 Linkedln Corporation 发明人 Sankar Sriram
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A process comprising: receiving into a computer processor a plurality of names; removing one or more vowels from each of the plurality of names, thereby generating a plurality of cluster ids; forming a plurality of first clusters by grouping names having an equivalent cluster id; for each first cluster, determining an edit distance between each unique name in the first cluster and each other unique name in the first cluster; aggregating the edit distances for each unique name in the first cluster; and for each unique name in the first cluster, keeping the unique name in the first cluster when the aggregating of the edit distances between the unique name in the first cluster and each other unique name in the first cluster is less than a threshold, thereby generating a final cluster.
地址 Mountain View CA US