发明名称 Method For Deducing Entity Relationships Across Corpora Using Cluster Based Dictionary Vocabulary Lexicon
摘要 An approach is provided for identifying entity relationships based on word classifications extracted from business documents stored in a plurality of corpora. In the approach, performed by an information handling system, a plurality of cluster classifications are identified for the business documents so that entity information from the business documents can be classified or assigned to the cluster classifications, such as by performing natural language processing (NLP) analysis of the business documents. The approach applies semantic analysis to identify and score entity relationships between the entity information classified in the cluster classifications, and based on the scored entity relationships, cluster relationships between the cluster classifications are identified.
申请公布号 US2016092448(A1) 申请公布日期 2016.03.31
申请号 US201514638264 申请日期 2015.03.04
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Byron Donna K.;Chandrasekaran Swaminathan;Krishnamurthy Lakshminarayanan
分类号 G06F17/30;G06N5/04 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method, in an information handling system comprising a processor and a memory, of identifying cluster relationships for searching across a plurality of corpora, the method comprising: identifying, by the system, a plurality of different cluster classifications for a corresponding plurality of corpora; classifying, by the system, entity information from documents stored in the plurality of corpora into the plurality of different cluster classifications; applying semantic analysis, by the system, to identify entity relationships between entity information classified in the plurality of different cluster classifications; determining, by the system, one or more scores for each identified entity relationship; identifying, by the system, a cluster relationship between at least two cluster classifications based on the one or more scores for each identified entity relationship; and searching, by the information handling system, at least first and second corpora corresponding to the at least two cluster classifications having the identified cluster relationship.
地址 Armonk NY US