发明名称 Posterior probability pursuit for entity disambiguation
摘要 Various technologies described herein pertain to disambiguation of a mention of an ambiguous entity in a document. A set of candidate entities can be retrieved from an entity knowledge base based upon the mention of the ambiguous entity, where each of the candidate entities has a respective entity feature representation. Moreover, a document feature representation can be generated based upon features of the document and the respective entity feature representations of the candidate entities. A processor can be caused to select a subset of features from the document feature representation based upon a measure of how discriminative the features from the document feature representation are for disambiguating the mention of the ambiguous entity. A disambiguated result for the mention of the ambiguous entity can be determined based upon the subset of the features. The disambiguated result can be an unknown entity or one of the candidate entities.
申请公布号 US9542652(B2) 申请公布日期 2017.01.10
申请号 US201313779769 申请日期 2013.02.28
申请人 Microsoft Technology Licensing, LLC 发明人 Jin Yuzhe;Kiciman Emre Mehmet;Wang Kuansan
分类号 G06F15/18;G06N7/00;G06N99/00;G06F17/30 主分类号 G06F15/18
代理机构 代理人 Corie Alin;Swain Sandy;Minhas Micky
主权项 1. A method of disambiguating a mention of an ambiguous entity, comprising: receiving a document that comprises the mention of the ambiguous entity; retrieving a set of candidate entities from an entity knowledge base based upon the mention of the ambiguous entity, wherein each of the candidate entities has a respective entity feature representation; generating a document feature representation based upon features of the document and the respective entity feature representations of the candidate entities, the document feature representation being generated to comprise features that are part of both: the features of the document; anda union set of features from the entity feature representations of the candidate entities; causing a processor to select a subset of the features from the document feature representation based upon a measure of how discriminative the features from the document feature representation are for disambiguating the mention of the ambiguous entity; and determining a disambiguated result for the mention of the ambiguous entity based upon the subset of the features by employing a posterior probability pursuit algorithm, wherein the disambiguated result is one of: an unknown entity; orone of the candidate entities.
地址 Redmond WA US