发明名称 Extracting data from semi-structured text documents
摘要 The invention is a process, system, and workflow for extracting and warehousing data from semi-structured documents in any language. This includes, but is not limited to, one or more of methods for: the automatic building of text mining term models; the optimization or evolution of such text mining term models; the implementation of document specific (or company specific) memory; and the tying or linking of the extracted data, or metadata, once placed in a target electronic document, to the machine readable, underlying source document, thus providing verification and provenance. The process preferably incorporates a wizard-based method for producing pattern recognition text mining term models to extract data from text. The invention also includes a system, method and workflow for handling a subsequent document of similar design and structure, specifically the automatic extraction of target elements and addition of the same to a database.
申请公布号 US2006242180(A1) 申请公布日期 2006.10.26
申请号 US20040565611 申请日期 2004.07.23
申请人 GRAF JAMES A;KOROTEYEV VLADIMIR;MIKHAYLOV EDUARD Y;BRICKER ELLIOT I;LEVY BENJAMIN D A;WONG AUGUSTINUS Y 发明人 GRAF JAMES A.;KOROTEYEV VLADIMIR;MIKHAYLOV EDUARD Y.;BRICKER ELLIOT I.;LEVY BENJAMIN D.A.;WONG AUGUSTINUS Y.
分类号 G06F17/00;G06F;G06F17/30 主分类号 G06F17/00
代理机构 代理人
主权项
地址