发明名称 Efficient calculation of similarity search values and digest block boundaries for data deduplication
摘要 For efficient calculation of both similarity search values and boundaries of digest blocks in data deduplication, input data is partitioned into chunks, and for each chunk a set of rolling hash values is calculated. A single linear scan of the rolling hash values is used to produce both similarity search values and boundaries of the digest blocks of the chunk.
申请公布号 US9244937(B2) 申请公布日期 2016.01.26
申请号 US201313840094 申请日期 2013.03.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Akirav Shay H.;Aronovich Lior;Ben-Dor Shira;Hirsch Michael;Leneman Ofer
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Griffiths & Seaton PLLC 代理人 Griffiths & Seaton PLLC
主权项 1. A method for efficient calculation of both similarity search values and boundaries of digest blocks in a data deduplication system using a processor device in a computing environment, comprising: partitioning input data into data chunks; calculating a set of rolling hash values for each of the data chunks; using a single linear scan of the rolling hash values for producing both the similarity search values and the boundaries of the digest blocks; using each of the rolling hash values to contribute to the calculation of the similarity search values and to the calculation of the boundaries of the digest blocks; and discarding each of the rolling hash values after contributing to the calculation of the similarity search values and to the calculation of the boundaries of the digest blocks.
地址 Armonk NY US