发明名称 Systems, methods, and software for entity relationship resolution
摘要 To facilitate access to public records, the present inventors devised, among other things, an entity resolution system. The exemplary system includes master records database of 300 million entities, which is partitioned into multiple distinct portions. The exemplary system extracts entity information from input public records and constructs one or more blocking queries against specific portions of the master records database to identify one or more sets of candidate records. Feature vectors are defined for the candidate records and machine learning techniques, such as Support Vector Machine, are used to determine which of the candidate records from the master records database match the input public records. Candidate records that match are logically associated with public records, enabling ready access via direct or indirect queries.
申请公布号 US9600509(B2) 申请公布日期 2017.03.21
申请号 US200812341913 申请日期 2008.12.22
申请人 Thomson Reuters Global Resources 发明人 Conrad Jack G.;Dozier Christopher C.;Veeramachaneni Sriharsha
分类号 G06F17/30;G06F7/00 主分类号 G06F17/30
代理机构 Duncan Galloway Egan Greenwald, PLLC 代理人 Duncan Galloway Egan Greenwald, PLLC ;Duncan Kevin T.
主权项 1. A system comprising: one or more processors; an entity resolution database (“ERD”) resolution engine adapted to retrieve, responsive to a first set of data in one or more data fields in a public record, a set of candidate named entity records from a master named entity database based on one of a set of two or more blocking queries, wherein each blocking query in the set of two or more blocking queries comprises a query for a last name and a first name, and a city name, all extracted from the public record, and a query for a last name and a first name, all from the public record; the ERD resolution engine further adapted to automatically determine a permutation for each blocking query in the set of two or more blocking queries and an order of execution for the set of two or more blocking queries based on the first set of data; the ERD resolution engine further adapted to calculate similarity scores for the first set of data in the one or more of the data fields in the public record and a second set of data in a set of data fields in the set of candidate named entity records by comparing the second set of data in the set of data fields in the set of candidate named entity records retrieved by the set of blocking queries with the first set of data in the one or more data fields in the public record; and the ERD resolution engine further adapted to determine a confidence rating for one or more of the set of similarity scores between the public record and the candidate named entity record.
地址 CH