发明名称 Systems and methods for large scale global entity resolution
摘要 Systems and methods for coreference resolution are disclosed. In one embodiment, a method includes locating, for each of a selected plurality of chains of coreferent mentions, a particular context-based name from the respective chain, wherein the coreferent mentions correspond to entities and the context-based name is a longest name in the respective chain, a last name in the respective chain, or a most frequently occurring name in the respective chain. The method also includes determining an entity category for each respective one of the plurality of chains and determining one or more entity attributes from structured data and unstructured data. The method further includes, based on the located particular context-based name, the entity category, and the one or more attributes, assigning high-probability coreferent chains to high-confidence buckets, such as to produce a Zipfian-like distribution having a head region and a tail region.
申请公布号 US9311301(B1) 申请公布日期 2016.04.12
申请号 US201514750936 申请日期 2015.06.25
申请人 Digital Reasoning Systems, Inc. 发明人 Balluru Vishnuvardhan;Graham Kenneth;Hilliard Naomi
分类号 G06F17/28;G06N99/00;G06N7/00;G06F17/27 主分类号 G06F17/28
代理机构 Troutman Sanders LLP 代理人 Troutman Sanders LLP ;Schneider Ryan A.;Glass Christopher W.
主权项 1. A computer-implemented method, comprising: ingesting text data from a plurality of documents containing a plurality of mentions; locating, from the text data, for each of a selected plurality of chains of coreferent mentions, a particular context-based name from the respective chain, wherein the coreferent mentions correspond to entities and the context-based name is a longest name in the respective chain, a last name in the respective chain, or a most frequently occurring name in the respective chain; determining an entity category for each respective one of the plurality of chains; determining one or more entity attributes from structured data and unstructured data; based on the located particular context-based name, the entity category, and the one or more attributes, assigning high-probability coreferent chains to high-confidence buckets, such as to produce a power law probability distribution having a head region and a tail region; and resolving, based at least in part on the power law probability distribution, the coreferent mentions to identify corresponding real-world entities.
地址 Franklin TN US