发明名称 Extracting structured knowledge from unstructured text
摘要 Embodiments of the present invention relate to knowledge representation systems which include a knowledge base in which knowledge is represented in a structured, machine-readable format that encodes meaning. Techniques for extracting structured knowledge from unstructured text and for determining the reliability of such extracted knowledge are also described.
申请公布号 US9110882(B2) 申请公布日期 2015.08.18
申请号 US201113106562 申请日期 2011.05.12
申请人 Amazon Technologies, Inc. 发明人 Overell Simon;Tunstall-Pedoe William
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 Weaver Austin Villeneuve & Sampson LLP 代理人 Weaver Austin Villeneuve & Sampson LLP
主权项 1. A computer-implemented method for extracting structured knowledge from unstructured text for use in a knowledge representation system, the knowledge representation system comprising a knowledge base that represents knowledge using a structured, machine-readable format, the structured, machine-readable format comprising fact triples, the method comprising: identifying sentences in the unstructured text using one or more computing devices; using the one or more computing devices, converting each of a subset of the sentences to one or more simplified assertion statements of the form: subject noun phrase, verb phrase, object noun phrase; converting each of a subset of the simplified assertion statements to a corresponding fact triple using the one or more computing devices, each fact triple being constructed from three knowledge base objects, the three knowledge base objects comprising two entity objects and a relationship object expressing a relationship between the two entity objects; using the one or more computing devices, grouping the fact triples into a plurality of quarantine groups such that each of the fact triples is included in more than one of the quarantine groups, each quarantine group being defined by a corresponding one of a plurality of fact characteristics, a first one of the fact characteristics being that all of the fact triples in the corresponding quarantine group include a same one of the entity objects, a second one of the fact characteristics being that all of the fact triples in the corresponding quarantine group include a same one of the relationship objects; determining a reliability for each quarantine group with reference to the knowledge base; determining that more than one of the quarantine groups in which a first fact triple is included has at least a specified reliability; and classifying the first fact triple as a reliable fact triple in response to determining that more than one of the quarantine groups in which the first fact triple is included has at least the specified reliability.
地址 Seattle WA US