发明名称 SYSTEMS AND METHODS FOR EFFICIENT DATA SEARCHING, STORAGE AND REDUCTION
摘要 A computer-implemented method, according to one embodiment includes, for each repository data chunk in repository data that comprises a plurality of the repository data chunks, generating a corresponding set of repository distinguishing characteristics (RDCs). Each set of RDCs is generated by: applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk, applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk, applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes, and defining the second subset of hashes as the set of RDCs.
申请公布号 US2016342482(A1) 申请公布日期 2016.11.24
申请号 US201615225510 申请日期 2016.08.01
申请人 International Business Machines Corporation 发明人 Aronovich Lior;Asher Ron;Bachmat Eitan;Bitner Haim;Hirsch Michael;Klein Shmuel T.
分类号 G06F11/14;G06F17/30 主分类号 G06F11/14
代理机构 代理人
主权项 1. A computer-implemented method, comprising: for each repository data chunk in repository data that comprises a plurality of the repository data chunks, generating a corresponding set of repository distinguishing characteristics (RDCs), wherein each set of RDCs is generated by: applying a hash function to the respective input data chunk or repository data chunk to generate a plurality of hashes, each hash comprising a hash value and a hash position within the data chunk;applying a first function to the plurality of generated hashes to identify a first subset of hashes distributed across the data chunk;applying a second function to the hash positions of the hashes of the first subset to identify a second subset of the plurality of generated hashes; anddefining the second subset of hashes as the set of RDCs.
地址 Armonk NY US