发明名称 |
Computer-based system and method for generating, classifying, searching, and analyzing standardized text templates and deviations from standardized text templates |
摘要 |
A method for generating, classifying, searching, and analyzing standardized text templates drawn from a plurality of text documents and for identifying standardized text deviations from standardized text templates. Semi-standardized documents may be represented as standardized templates and deviations from standardized templates, with such templates themselves automatically generated by a computer-implemented method from a plurality of similar text documents. The method enables enhanced analysis of semi-standardized documents and automatic extraction of information from standardized text templates. |
申请公布号 |
US9195639(B2) |
申请公布日期 |
2015.11.24 |
申请号 |
US201213628847 |
申请日期 |
2012.09.27 |
申请人 |
THE BUREAU OF NATIONAL AFFAIRS, INC. |
发明人 |
Anderson Robert |
分类号 |
G06F17/30;G06F17/24;G06F17/27;G06F17/28 |
主分类号 |
G06F17/30 |
代理机构 |
Frommer Lawrence & Haug LLP |
代理人 |
Gordon Jon E.;Frommer Lawrence & Haug LLP |
主权项 |
1. A computer system configured to automatically analyze text documents by performing the following steps:
comparing text from a subject text to text of a plurality of given text templates, each text template containing at least one paragraph of text; determining which given text template or text templates has text that matches the text from the subject text document to a given degree of correspondence; generating a report of the differences between the text from the subject text document and the text of the matching text template or text templates; comparing a family of specimen text documents; identifying one paragraph of text within one of the family of specimen text documents that most closely matches a paragraph of text in all of the other specimen text documents, as compared to all of the other paragraphs in the one specimen text document; and generating one of the text templates containing at least the one identified paragraph of text. |
地址 |
Arlington VA US |