摘要 |
<p>A system for analyzing and extracting words and word groups from an electronic document (104) and for storing the extracted words and word groups into predefined fields or tables in a target database (110) comprises a content analysis and semantic network engine (216) for analyzing and extracting words and word groups from the electronic document and a heuristics engine (212) coupled to the content analysis and semantic network engine (216), for applying a set of heuristics to the words and word groups in the electronic document. The content analysis and semantic network engine (216) further comprises a thesaurus (400) for linking together terms (402) and concepts (404) and for defining relationships between and among the terms (402) and concepts (404), a semantic network (220) coupled to the thesaurus (400), for organizing the terms (402) and concepts (404) in the thesaurus (400), meta-concepts (502), and categories (504) in a hierarchical structure, and section processors (218) for analyzing a section in the electronic document (104) and applying a set of heuristics to each section in the electronic document (104). The system further comprises a document pre-processor (210) for performing an initial analysis on the electronic document (104), a morphological analysis engine (214) coupled to the heuristics engine (212) for performing a morphological analysis and tagging of words and word groups in the electronic document (104), and a database interface (222) for providing an interface between the content analysis and semantic network engine (216) and the target database (110).</p> |