摘要 |
In a method for generating a schema for a corpus of data, a first corpus of data is received, wherein the first corpus of data includes unstructured text. A processor identifies a set of one or more entity relationships within the first corpus of data, wherein an entity relationship comprises a first entity, a second entity, and a specified relationship between the entities. A processor compares the set of one or more entity relationships to a second corpus of data, wherein the second corpus of data includes text of a subject matter different than the corpus of data. A processor determines a score for each entity relationship based on the comparison to the second corpus of data. A processor generates a schema for the first corpus of data based on the score for each entity relationship of the set of one or more entity relationships. |