摘要 |
Named entities are disambiguated in search queries and other contexts using a disambiguation scoring model. The scoring model is developed using a knowledge base of articles, including articles about named entities. Various aspects of the knowledge base, including article titles, redirect pages, disambiguation pages, hyperlinks, and categories, are used to develop the scoring model. |
主权项 |
1. A method comprising:
receiving, by a computer and from a client system, a query including a proper name, the proper name appearing in a context in the query, the context including terms of the proper name and additional terms that do not include the proper name; determining, by the computer, named entities corresponding to the proper name; for each corresponding named entity, identifying, by the computer, a named entity article about the named entity, wherein each named entity article about a named entity is different from the named entity articles about the other named entities; for each corresponding named entity, determining, by the computer, prior to disambiguation of the proper name included in the query, a similarity score between the named entity article about the named entity and the context in the query containing the proper name, wherein the similarity score is a measurement of correlation between the context in the query containing the proper name and the named entity article; disambiguating, by the computer and based on the similarity scores between the context in the query and the respective named entity articles, the proper name to a single instance of the proper name by associating it with the named entity article having a highest similarity score between the context in the query and the named entity article; and providing, by the computer, the named entity corresponding to the named entity article having the highest similarity score; wherein each named entity article is a web document about the named entity; and determining a similarity score includes determining a similarity score based, in part, on link structure metrics of the named entity article, popularity metrics of the named entity article, recency scores for the named entity article, and reputation based scores for the named entity article. |