发明名称 Computer-based system and method for generating, classifying, searching, and analyzing standardized text templates and deviations from standardized text templates
摘要 A method for generating, classifying, searching, and analyzing standardized text templates drawn from a plurality of text documents and for identifying standardized text deviations from standardized text templates. Semi-standardized documents may be represented as standardized templates and deviations from standardized templates, with such templates themselves automatically generated by a computer-implemented method from a plurality of similar text documents. The method enables enhanced analysis of semi-standardized documents and automatic extraction of information from standardized text templates.
申请公布号 US9195639(B2) 申请公布日期 2015.11.24
申请号 US201213628847 申请日期 2012.09.27
申请人 THE BUREAU OF NATIONAL AFFAIRS, INC. 发明人 Anderson Robert
分类号 G06F17/30;G06F17/24;G06F17/27;G06F17/28 主分类号 G06F17/30
代理机构 Frommer Lawrence & Haug LLP 代理人 Gordon Jon E.;Frommer Lawrence & Haug LLP
主权项 1. A computer system configured to automatically analyze text documents by performing the following steps: comparing text from a subject text to text of a plurality of given text templates, each text template containing at least one paragraph of text; determining which given text template or text templates has text that matches the text from the subject text document to a given degree of correspondence; generating a report of the differences between the text from the subject text document and the text of the matching text template or text templates; comparing a family of specimen text documents; identifying one paragraph of text within one of the family of specimen text documents that most closely matches a paragraph of text in all of the other specimen text documents, as compared to all of the other paragraphs in the one specimen text document; and generating one of the text templates containing at least the one identified paragraph of text.
地址 Arlington VA US