发明名称 IDENTIFYING GLOSSARY TERMS FROM NATURAL LANGUAGE TEXT DOCUMENTS
摘要 A device may obtain text to be analyzed to identify glossary terms. The device may analyze a linguistic unit to generate multiple linguistic units related to the linguistic unit. The device may analyze the multiple linguistic units to generate potential glossary terms. The device may perform a glossary term analysis on the potential glossary terms to generate glossary terms that include a subset of the potential glossary terms. The device may identify included terms that are included in the glossary terms. The device may identify excluded terms that are excluded from the glossary terms. The device may determine a semantic relatedness score between at least one excluded term and at least one included term. The device may selectively add the excluded linguistic term to the glossary terms to form a final set of glossary terms based on the semantic relatedness score, and may output the final set of glossary terms.
申请公布号 US2014163966(A1) 申请公布日期 2014.06.12
申请号 US201314092518 申请日期 2013.11.27
申请人 Accenture Global Services Limited 发明人 DWARAKANATH Anurag;Ramnani Roshni R.;Sengupta Shubhashis;Aggarwal Aniya
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项 1. A device, comprising: one or more processors to: obtain text of a document to be analyzed to identify glossary terms included in the text;perform a linguistic unit analysis on a linguistic unit, included in the text, to generate a plurality of ambiguous linguistic units from the linguistic unit;resolve the plurality of ambiguous linguistic units to generate a set of potential glossary terms that includes a subset of the plurality of ambiguous linguistic units;perform a glossary term analysis on the set of potential glossary terms to generate a set of glossary terms that includes a subset of the set of potential glossary terms;identify a set of included terms, of the set of potential glossary terms, that are included in the set of glossary terms;identify a set of excluded terms, of the set of potential glossary terms, that are excluded from the set of glossary terms;determine a semantic relatedness score between at least one excluded term, of the set of excluded terms, and at least one included term, of the set of included terms;selectively add the excluded linguistic term to the set of glossary terms to form a final set of glossary terms based on the semantic relatedness score; andoutput the final set of glossary terms for the document.
地址 Dublin IE