发明名称 Entity disambiguation in natural language text
摘要 A device analyzes first text to identify a pair of terms, within the first text, that are alias terms. The device analyzes the first text by performing two or more of: a latent semantic analysis of the pair of terms, based on the pair of terms being associated with a particular tag; a tag-based analysis that determines that the pair of terms are associated with compatible tags; a transitive analysis that determines that a pair of neighbor terms, associated with the pair of terms, are associated with compatible tags; or a co-location analysis based on a distance between the pair of terms in the first text. The device generates, based on analyzing the first text, a glossary that includes the pair of terms identified as alias terms. The device replaces terms within the first text or a second text that is different from the first text, using the glossary.
申请公布号 US9245015(B2) 申请公布日期 2016.01.26
申请号 US201313790864 申请日期 2013.03.08
申请人 Accenture Global Services Limited 发明人 Misra Janardan;Das Subhabrata
分类号 G06F17/30;G06F17/27 主分类号 G06F17/30
代理机构 Harrity & Harrity, LLP 代理人 Harrity & Harrity, LLP
主权项 1. A method comprising: analyzing, by a device, first text to identify a pair of terms, within the first text, that are alias terms, the analyzing the first text including performing two or more of: a latent semantic analysis of the pair of terms, based on the pair of terms being associated with a particular tag;a tag-based analysis that determines that the pair of terms are associated with compatible tags;a transitive analysis that determines that a pair of neighbor terms, associated with the pair of terms, are associated with compatible tags; ora co-location analysis based on a distance between the pair of terms in the first text; andthe analyzing the first text further including performing one or more of: a misspelling analysis to determine that a first term, of the pair of terms, is a misspelling of a second term, of the pair of terms,a short form analysis to determine that the first term is a short form of the second term, oran explicit alias analysis to determine that the first term is an explicit alias of the second term; calculating, by the device and based on analyzing the first text, a first alias score for the pair of terms; calculating, by the device and using the first alias score for the pair of terms, a second alias score for the pair of terms; determining, by the device, that the second alias score satisfies a threshold; generating, by the device and based on determining that the second alias score satisfies the threshold, a glossary that includes the pair of terms identified as alias terms, the glossary being generated based on the performing the one or more of the misspelling analysis, the short form analysis, or the explicit alias analysis; and replacing terms, by the device and using the glossary, within at least one of: the first text, ora second text that is different from the first text.
地址 Dublin IE