发明名称 NLP-based entity recognition and disambiguation
摘要 Methods and systems for entity recognition and disambiguation using natural language processing techniques are provided. Example embodiments provide an entity recognition and disambiguation system (ERDS) and process that, based upon input of a text segment, automatically determines which entities are being referred to by the text using both natural language processing techniques and analysis of information gleaned from contextual data in the surrounding text. In at least some embodiments, supplemental or related information that can be used to assist in the recognition and/or disambiguation process can be retrieved from knowledge repositories such as an ontology knowledge base. In one embodiment, the ERDS comprises a linguistic analysis engine, a knowledge analysis engine, and a disambiguation engine that cooperate to identify candidate entities from a knowledge repository and determine which of the candidates best matches the one or more detected entities in a text segment using context information.
申请公布号 US9613004(B2) 申请公布日期 2017.04.04
申请号 US201313944340 申请日期 2013.07.17
申请人 VCVC III LLC 发明人 Liang Jisheng;Koperski Krzysztof;Dhillon Navdeep S.;Tusk Carsten;Bhatti Satish
分类号 G06F17/21;G06F17/27 主分类号 G06F17/21
代理机构 Lowe Graham Jones PLLC 代理人 Bierman Ellen M.;Lowe Graham Jones PLLC
主权项 1. A computer-implemented method for disambiguating one or more entities in an indicated text segment, comprising: processing the indicated text segment to determine a plurality of terms and their associated parts-of-speech tags and grammatical roles; performing linguistic analysis of the processed text segment to determine one or more potential entities which are referred to in the text segment by potential entity names; generating and storing, for each potential entity, an entity profile data structure storing one or more associated properties that characterize the entity based upon surrounding context and linguistic information received from the performed linguistic analysis of the processed text segment, the entity profile properties including one or more roles attributable to the potential entity based upon actions and/or modifiers associated with the determined potential entity name that have been retrieved from a linguistic analysis of the surrounding context; disambiguating which entities are being referred to in the indicated text segment by determining one or more mostly likely entities that are referred to in the text segment by comparing, using both linguistic and contextual information, the entity profiles generated for each potential entity with attributes of one or more candidate entities retrieved from a data repository; and invoking the method to annotate information on a web page.
地址 Seattle WA US