发明名称 Automatic annotation for training and evaluation of semantic analysis engines
摘要 Implementations include systems and methods generate data for training or evaluating semantic analysis engines. For example, a method may include receiving documents from a corpus that includes an authoritative set of documents from an authoritative source. Each document in the authoritative set may be associated with an entity. A second set of documents from the corpus that do not overlap with the first set may include at least one link to a document in the authoritative set, the at least one link being associated with anchor text. For each document in the second set, the method may include identifying entity mentions in the document based on the anchor text. The method may include associating the entity mention with the entity in a graph-structured knowledge base or associating entity types with the entity mention. The method may also include training a semantic analysis engine using the identified entity mentions and associations.
申请公布号 US9224103(B1) 申请公布日期 2015.12.29
申请号 US201313801197 申请日期 2013.03.13
申请人 Google Inc. 发明人 Subramanya Amarnag;Pereira Fernando
分类号 G06F15/18;G06N99/00 主分类号 G06F15/18
代理机构 Brake Hughes Bellerman LLP 代理人 Brake Hughes Bellerman LLP
主权项 1. A computer system comprising: at least one processor; and memory storing instructions that, when executed by the at least one processor, causes the computer system to perform operations comprising: receiving documents from a corpus, the corpus comprising: an authoritative set of documents from an authoritative source, each document in the authoritative set being associated with an entity, anda second set of documents, the second set being documents that are not in the authoritative set and that are not copies of documents in the authoritative set but that each include at least one link to a document in the authoritative set, the at least one link being associated with anchor text,identifying, for each document in the second set, entity mentions in the document based on the anchor text, each entity mention including the anchor text and an identifier of the linked-to authoritative document,associating the identified entity mentions with respective entity types based on content in the linked-to authoritative document, andtraining an entity tagging engine using the identified entity mentions and the entity types associated with the entity mentions.
地址 Mountain View CA US