摘要 |
A system and method for extracting data, hereinafter referred to as MitoMine(TM), that produces a strongly-typed ontology defined collection referencing (and cross referencing) all extracted records. The input to the mining process can be any data source, such as a text file delimited into a set of possibly dissimilar records. Mitomine contains parser routines and post-processing functions, known as 'munchers'. The parser routines can be accessed either via a batch mining process or as part of a running server process connected to a live source. Munchers can be registered on a per data-source basis in order to process the records produced, possibly writing them to an external database and/or a set of servers. The present invention also embeds an interpreted ontology based language within a compiler/interpreter (for the source format) such that the statements of the embedded language are executed as a result of the source compiler 'recognizing' a given construct within the source and extracting the corresponding source content. In this way, the execution of the statements in the embedded program will occur in a sequence that is dictated wholly by the source content. This system and method therefore make it possible to bulk extract free-form data from such sources as CD-ROMs, the web etc. and have the resultant structured data loaded into an ontology based system.
|