摘要 |
A computer-implemented method and computer readable media for creating an ontology for a domain by reference to a knowledge corpus comprising linked documents and a category hierarchy wherein each document can be contained in one or more categories and wherein categories can contain one or more other categories. In some embodiments, the method comprises: searching the corpus to identify documents with text that matches a seed domain description; identifying further documents within the corpus that are semantically similar to the identified documents; identifying a subgraph of the category hierarchy that includes the categories assigned to the extracted documents and the further documents; reducing the subgraph to form the ontology by requiring that documents therein be indicative of a second domain description, the second domain description being at least as broad as the seed domain description. |