发明名称 DYNAMIC RECORD BLOCKING
摘要 Dynamic blocking determines which pairs of records in a data set should be examined as potential duplicates. Records are grouped together into blocks by shared properties that are indicators of duplication. Blocks that are too large to be efficiently processed are further subdivided by other properties chosen in a data-driven way. We demonstrate the viability of this algorithm for large data sets. We have scaled this system up to work on billions of records on an 80 node Hadoop cluster.
申请公布号 US2013173560(A1) 申请公布日期 2013.07.04
申请号 US201213349414 申请日期 2012.01.12
申请人 MCNEILL WILLIAM P.;BORTHWICK ANDREW;INTELIUS INC. 发明人 MCNEILL WILLIAM P.;BORTHWICK ANDREW
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址