主权项 |
1. A computer-implemented method for extracting structured knowledge from unstructured text for use in a knowledge representation system, the knowledge representation system comprising a knowledge base that represents knowledge using a structured, machine-readable format, the structured, machine-readable format comprising fact triples, the method comprising:
identifying sentences in the unstructured text using one or more computing devices; using the one or more computing devices, converting each of a subset of the sentences to one or more simplified assertion statements of the form: subject noun phrase, verb phrase, object noun phrase; converting each of a subset of the simplified assertion statements to a corresponding fact triple using the one or more computing devices, each fact triple being constructed from three knowledge base objects, the three knowledge base objects comprising two entity objects and a relationship object expressing a relationship between the two entity objects; using the one or more computing devices, grouping the fact triples into a plurality of quarantine groups such that each of the fact triples is included in more than one of the quarantine groups, each quarantine group being defined by a corresponding one of a plurality of fact characteristics, a first one of the fact characteristics being that all of the fact triples in the corresponding quarantine group include a same one of the entity objects, a second one of the fact characteristics being that all of the fact triples in the corresponding quarantine group include a same one of the relationship objects; determining a reliability for each quarantine group with reference to the knowledge base; determining that more than one of the quarantine groups in which a first fact triple is included has at least a specified reliability; and classifying the first fact triple as a reliable fact triple in response to determining that more than one of the quarantine groups in which the first fact triple is included has at least the specified reliability. |