发明名称 Method For Preserving Conceptual Distance Within Unstructured Documents
摘要 A method, system and computer-usable medium are disclosed for preserving conceptual distance within unstructured documents by characterizing conceptual relationships. Natural language processing is applied to content in a plurality of documents to identify topics and subjects. Analytic analysis is then applied to the identified topics and subjects to identify concepts. The content in each of the plurality of documents is partitioned into a first structured hierarchy, preserving at least one structure in each document inherent in the each document. Access is then provided to the content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy. The conceptual relationship criteria are based upon a directed graph with weights based upon a similarity and a distance based upon concepts.
申请公布号 US2016098398(A1) 申请公布日期 2016.04.07
申请号 US201514641527 申请日期 2015.03.09
申请人 International Business Machines Corporation 发明人 Bufe John P.;Winkler Timothy P.
分类号 G06F17/30;G06F17/28 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for characterizing content of documents by conceptual relationships, comprising: applying natural language processing (NLP) to content in a plurality of documents to identify topics and subjects; applying analytic analysis to the topics and subjects to identify a conceptual relationships of the content in the plurality of documents; partitioning the content in each of the plurality of documents into a first structured hierarchy, preserving at least one structure in each document inherent in the each document; and providing access to content through a first index based upon utilizing the first structured hierarchy and through a second index utilizing a second structured hierarchy.
地址 Armonk NY US