发明名称 |
Efficient calculation of similarity search values and digest block boundaries for data deduplication |
摘要 |
For efficient calculation of both similarity search values and boundaries of digest blocks in data deduplication, input data is partitioned into chunks, and for each chunk a set of rolling hash values is calculated. A single linear scan of the rolling hash values is used to produce both similarity search values and boundaries of the digest blocks of the chunk. |
申请公布号 |
US9244937(B2) |
申请公布日期 |
2016.01.26 |
申请号 |
US201313840094 |
申请日期 |
2013.03.15 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
Akirav Shay H.;Aronovich Lior;Ben-Dor Shira;Hirsch Michael;Leneman Ofer |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Griffiths & Seaton PLLC |
代理人 |
Griffiths & Seaton PLLC |
主权项 |
1. A method for efficient calculation of both similarity search values and boundaries of digest blocks in a data deduplication system using a processor device in a computing environment, comprising:
partitioning input data into data chunks; calculating a set of rolling hash values for each of the data chunks; using a single linear scan of the rolling hash values for producing both the similarity search values and the boundaries of the digest blocks; using each of the rolling hash values to contribute to the calculation of the similarity search values and to the calculation of the boundaries of the digest blocks; and discarding each of the rolling hash values after contributing to the calculation of the similarity search values and to the calculation of the boundaries of the digest blocks. |
地址 |
Armonk NY US |