发明名称 Disambiguation of named entities
摘要 Named entities are disambiguated in search queries and other contexts using a disambiguation scoring model. The scoring model is developed using a knowledge base of articles, including articles about named entities. Various aspects of the knowledge base, including article titles, redirect pages, disambiguation pages, hyperlinks, and categories, are used to develop the scoring model.
申请公布号 US9135238(B2) 申请公布日期 2015.09.15
申请号 US200611427678 申请日期 2006.06.29
申请人 Google Inc. 发明人 Bunescu Razvan Constantin;Pasca Alexandru Marius
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A method comprising: receiving, by a computer and from a client system, a query including a proper name, the proper name appearing in a context in the query, the context including terms of the proper name and additional terms that do not include the proper name; determining, by the computer, named entities corresponding to the proper name; for each corresponding named entity, identifying, by the computer, a named entity article about the named entity, wherein each named entity article about a named entity is different from the named entity articles about the other named entities; for each corresponding named entity, determining, by the computer, prior to disambiguation of the proper name included in the query, a similarity score between the named entity article about the named entity and the context in the query containing the proper name, wherein the similarity score is a measurement of correlation between the context in the query containing the proper name and the named entity article; disambiguating, by the computer and based on the similarity scores between the context in the query and the respective named entity articles, the proper name to a single instance of the proper name by associating it with the named entity article having a highest similarity score between the context in the query and the named entity article; and providing, by the computer, the named entity corresponding to the named entity article having the highest similarity score; wherein each named entity article is a web document about the named entity; and determining a similarity score includes determining a similarity score based, in part, on link structure metrics of the named entity article, popularity metrics of the named entity article, recency scores for the named entity article, and reputation based scores for the named entity article.
地址 Mountain View CA US