发明名称 DIGEST RETRIEVAL BASED ON SIMILARITY SEARCH IN DATA DEDUPLICATION
摘要 For digest retrieval based on similarity search in deduplication processing in a data deduplication system using a processor device in a computing environment, input data is partitioned into fixed sized data chunks. Similarity elements and digest block boundaries and digest values are calculated for each of the fixed sized data chunks. Matching similarity elements are searched for in a search structure containing the similarity elements for each of the fixed sized data chunks in a repository of data. Positions of similar data are located in the repository. The positions of the similar data are used to locate and load into the memory stored digest values and corresponding stored digest block boundaries of the similar data in the repository. The digest values and the corresponding digest block boundaries of the input data are matched with the stored digest values and the corresponding stored digest block boundaries to find data matches.
申请公布号 US2014279951(A1) 申请公布日期 2014.09.18
申请号 US201313839581 申请日期 2013.03.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 AKIRAV Shay H.;ARONOVICH Lior;BEN-DOR Shira;HIRSCH Michael;LENEMAN Ofer
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for retrieving digests based on a similarity search for efficient deduplication processing in a data deduplication system using a processor device in a computing environment, comprising: partitioning input data into data chunks; calculating for each of the data chunks similarity elements and digest values; searching for matching similarity elements in a search structure containing similarity elements; finding positions of similar data in a repository of data; using the positions of the similar data to locate and load into a memory stored digest values of the similar data in the repository; and matching the digest values of the input data with the stored digest values loaded into memory to find data matches.
地址 Armonk NY US