发明名称 Data duplication detection in an in memory data grid (IMDG)
摘要 Embodiments of the invention provide a method, system and computer program product for data duplication detection in an in memory data grid (IMDG). A method for data duplication detection in an IMDG includes computing a hash value for each binary data value in a key value pair of a partition in an IMDG. The method also includes generating a map including an entry for each unique computed hash value and one or more keys corresponding to binary data values of respective key value pairs from which the hash value had been uniquely computed. Thereafter, only those hash values in the map with multiple keys associated therewith are identified and binary data corresponding to the multiple keys of the identified hash values are reported as potential duplicate data in the IMDG.
申请公布号 US9613121(B2) 申请公布日期 2017.04.04
申请号 US201414202070 申请日期 2014.03.10
申请人 International Business Machines Corporation 发明人 Berg Douglas;Gaur Nitin;Johnson Christopher D.;Martin Brian K.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 CRGO Law 代理人 Greenberg, Esq. Steven M.;CRGO Law
主权项 1. A data processing system configured for data duplication detection in an in memory data grid (IMDG), the system comprising: a host computing system comprising one or more computers each with memory and at least one processor; at least one server communicatively coupled to the host computing system over a computer communications network, the at least one server hosting in memory an IMDG; an IMDG interface executing in the memory of the host computing system and providing access to the IMDG; and, a data duplication detection module executing in the memory of the host computing system, the module comprising program code enabled upon execution to compute a hash value for each binary data value in a key value pair of a partition in the IMDG, to generate a map including an entry for each unique computed hash value and one or more keys corresponding to binary data values of respective key value pairs from which the hash value had been uniquely computed, to identify only those hash values in the map with multiple keys associated therewith, and to report binary data corresponding to the multiple keys of the identified hash values as potential duplicate data in the IMDG.
地址 Armonk NY US