发明名称 Name Disambiguation Using Context Terms
摘要 Methods, systems and apparatus, including computer programs encoded on a computer storage medium, for disambiguating names in a document corpus. In an aspect, a method includes generating context term lists for a person name, each context term list being a list of context terms from a resource for the person name; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters; for each of the clusters, selecting a representative term for the cluster; receiving the person name as a search query; and generating a plurality of query suggestions from the search query and the representative terms for the clusters, each query suggesting being a combination of the person name and one representative term.
申请公布号 US2014214840(A1) 申请公布日期 2014.07.31
申请号 US201012955253 申请日期 2010.11.29
申请人 Gupta Nitin;Das Abhinandan S. 发明人 Gupta Nitin;Das Abhinandan S.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method performed by a data processing apparatus, the method comprising: generating context term lists for a person name, each context term list being a list of context terms that co-occur with the person name in a resource for the person name, and each of the resources to which the context term lists for the person name correspond being different resources, the generating the context term lists comprising: selecting the person name;selecting a plurality of resources that include the person name in the resource; andfor each of the selected resources: selecting context terms included in the resource that co-occur with the person name included in the resource and that are different from the person name; andgenerating a respective context term list for the resource and for the person name from the selected context terms; clustering the context term lists into a plurality of clusters, each of the clusters of context term lists including context term lists that are most similar to the cluster relative to other clusters, the clustering comprising iteratively determining a measure of similarity between pairs of context term lists based on the context terms in the pairs of lists and merging context term lists based on the measures of similarity; selecting from each of the clusters, a respective context term, the respective context term being a representative context term for the cluster; receiving the person name as a search query; and generating a plurality of query suggestions from the search query and the representative terms for the clusters, each query suggesting being a combination of the person name and one representative term.
地址 Santa Clara CA US