发明名称 EXTRACTING DATA FROM SEMI-STRUCTURED TEXT DOCUMENTS
摘要 The invention is a process, system, and workflow for extracting and warehousing data from semi-structured documents in any language. This includes, but is not limited to, one or more of methods for: the automatic building of text mining term models; the optimization or evolution of such text mining term models; the implementation of document specific (or company specific) memory; and the tying or linking of the extracted data, or metadata, once placed in a target electronic document, to the machine readable, underlying source document, thus providing verification and provenance. The process preferably incorporates a wizard-based method for producing pattern recognition text mining term models to extract data from text. The invention also includes a system, method and workflow for handling a subsequent document of similar design and structure, specifically the automatic extraction of target elements and addition of the same to a database. No previously defined rules or other rigid location specifying criteria regarding a particular document type need be expressed to mine this data.
申请公布号 WO2005010727(A2) 申请公布日期 2005.02.03
申请号 WO2004US23932 申请日期 2004.07.23
申请人 PRAEDEA SOLUTIONS, INC.;GRAF, JAMES, A.;KOROTEYEV, VLADIMIR, A.;MIKHAYLOV, EDUARD, Y.;BRICKER, ELLIOT, I.;LEVY, BENJAMIN, D., A.;WONG, AUGUSTINUS, Y. 发明人 GRAF, JAMES, A.;KOROTEYEV, VLADIMIR, A.;MIKHAYLOV, EDUARD, Y.;BRICKER, ELLIOT, I.;LEVY, BENJAMIN, D., A.;WONG, AUGUSTINUS, Y.
分类号 G06F;G06F17/30 主分类号 G06F
代理机构 代理人
主权项
地址