发明名称 System and method for using text analytics to identify a set of related documents from a source document
摘要 A system and method for processing a document to generate a set of related documents. A system is provided that includes a textual analytics system that analyzes unstructured data contained in a source document and extracts a set of structured information about the source document; and a compare system that identifies a set of related documents by comparing the set of structured information with metadata indexed from a set of publications.
申请公布号 US9495349(B2) 申请公布日期 2016.11.15
申请号 US200511281291 申请日期 2005.11.17
申请人 International Business Machines Corporation 发明人 Angell Robert L.;Boyer Stephen K.;Cooper James W.;Hennessy Richard A.;Kanungo Tapas;Kreulen Jeffrey T.;Martin David C.;Rhodes James J.;Spangler W. Scott;Weintraub Herschel J. R.
分类号 G06F7/00;G06F17/27 主分类号 G06F7/00
代理机构 Hoffman Warnick LLC 代理人 Simek Daniel;Hoffman Warnick LLC
主权项 1. A computer system for processing documents, the system comprising: a memory including a document processing system stored thereon, and a processor in communication with the memory, wherein the processor executes the document processing system stored in the memory, the document processing system including: a textual analytics system that analyzes unstructured data contained in a source document to generate a set of structured information about the source document and extracts the set of structured information about the source document; a compare system that identifies and aggregates a set of documents related to the source document by comparing the set of structured information with metadata stored in a metadata database, wherein the metadata stored in the metadata database is indexed from a set of technical reference publications, wherein a technical reference publication is identified as related to the source document and added to the set of documents related to the source document when the set of structured information extracted from the source document matches an associated metadata of the technical reference publication; an annotation system for annotating the source document, wherein the annotation system annotates the source document with the structured information extracted from the source document, and wherein the annotation system further annotates the source document with metadata associated with each technical reference publication in the set of related documents; and a ranking system for ranking the metadata in the annotated source document, wherein in a case in which more than one technical reference in the set of related documents is associated with a piece of metadata, the piece of metadata is assigned a higher rank of importance relative to a piece of metadata which is associated with fewer technical references in the set of related documents.
地址 Armonk NY US