发明名称 TECHNIQUES TO BLOCK RECORDS FOR MATCHING
摘要 Techniques to block records for matching are described. Some embodiments are particularly directed to techniques to block records for matching entities with inconsistent identifying information. In one embodiment, for example, an apparatus may comprise a configuration component, a coding component, a blocking component, and a matching component. The configuration component may be operative to receive a data set comprising a plurality of records and operative to receive a set of blocking variables, the blocking variables present as variables in each of the plurality of records. The coding component operative to generate match codes based on the blocking variables. The blocking component operative on the processor circuit to produce a plurality of blocks of records from the data set based on the match codes. The matching component operative to match records within each of the plurality of blocks by performing deterministic or probabilistic entity resolution based on similar variables of the records within each of the blocks. Other embodiments are described and claimed.
申请公布号 US2014258162(A1) 申请公布日期 2014.09.11
申请号 US201414203044 申请日期 2014.03.10
申请人 Maran Ned;Jung Jin-Whan;Sall Leslie 发明人 Maran Ned;Jung Jin-Whan;Sall Leslie
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. At least one non-transitory computer-readable storage medium comprising instructions that, when executed, cause a system to perform operations including: receive a data set comprising a plurality of records; receive blocking variables, the blocking variables present as variables in each of the plurality of records; generate match codes based on the blocking variables; produce a plurality of blocks of records from the data set based on the match codes; and match records within each of the plurality of blocks to determine entities referenced by more than one record by performing deterministic or probabilistic entity resolution based on similar variables of the records within each of the blocks.
地址 Cary NC US