摘要 |
The invention is a process, system, and workflow for extracting and warehousing data from semi-structured documents in any language. This includes, but is not limited to, one or more of methods for: the automatic building of text mining term models; the optimization or evolution of such text mining term models; the implementation of document specific (or company specific) memory; and the tying or linking of the extracted data, or metadata, once placed in a target electronic document, to the machine readable, underlying source document, thus providing verification and provenance. The process preferably incorporates a wizard-based method for producing pattern recognition text mining term models to extract data from text. The invention also includes a system, method and workflow for handling a subsequent document of similar design and structure, specifically the automatic extraction of target elements and addition of the same to a database. No previously defined rules or other rigid location specifying criteria regarding a particular document type need be expressed to mine this data. |
申请人 |
PRAEDEA SOLUTIONS, INC.;GRAF, JAMES, A.;KOROTEYEV, VLADIMIR, A.;MIKHAYLOV, EDUARD, Y.;BRICKER, ELLIOT, I.;LEVY, BENJAMIN, D., A.;WONG, AUGUSTINUS, Y. |
发明人 |
GRAF, JAMES, A.;KOROTEYEV, VLADIMIR, A.;MIKHAYLOV, EDUARD, Y.;BRICKER, ELLIOT, I.;LEVY, BENJAMIN, D., A.;WONG, AUGUSTINUS, Y. |