发明名称 |
Duplicate filtering in a data processing environment |
摘要 |
A data processing method is provided. The method comprises collecting a stream of data records received from one or more data sources connected in a communications network; dividing the stream of data records into sets of data records for parallel processing by a plurality of concurrently running tasks, wherein a first task loads a persistent index associated with a first set of data records into memory to generate an in-memory version of the first persistent index for the first set of data records; and identifying duplicate and non-duplicate data records in the first set of data records, based on searching the in-memory version of the first persistent index.
|
申请公布号 |
US8484171(B2) |
申请公布日期 |
2013.07.09 |
申请号 |
US201213437017 |
申请日期 |
2012.04.02 |
申请人 |
ARDITI JOEL;BERK DAVID HAROLD;GILAT DAGAN;KRUTYOLKIN SERGEY;LANDAU ARIEL;SHANI URI;INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
ARDITI JOEL;BERK DAVID HAROLD;GILAT DAGAN;KRUTYOLKIN SERGEY;LANDAU ARIEL;SHANI URI |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|