发明名称 Duplicate filtering in a data processing environment
摘要 A data processing method is provided. The method comprises collecting a stream of data records received from one or more data sources connected in a communications network; dividing the stream of data records into sets of data records for parallel processing by a plurality of concurrently running tasks, wherein a first task loads a persistent index associated with a first set of data records into memory to generate an in-memory version of the first persistent index for the first set of data records; and identifying duplicate and non-duplicate data records in the first set of data records, based on searching the in-memory version of the first persistent index.
申请公布号 US8484171(B2) 申请公布日期 2013.07.09
申请号 US201213437017 申请日期 2012.04.02
申请人 ARDITI JOEL;BERK DAVID HAROLD;GILAT DAGAN;KRUTYOLKIN SERGEY;LANDAU ARIEL;SHANI URI;INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 ARDITI JOEL;BERK DAVID HAROLD;GILAT DAGAN;KRUTYOLKIN SERGEY;LANDAU ARIEL;SHANI URI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址