发明名称 Optimizing data block size for deduplication
摘要 Provided herein is technology relating to data deduplication and particularly, but not exclusively, to methods and systems for determining an efficiently optimal size of data blocks to use for backing up a data source. Also provided herein are systems for identifying duplicate data in data backup applications.
申请公布号 US9626373(B2) 申请公布日期 2017.04.18
申请号 US201313802167 申请日期 2013.03.13
申请人 Western Digital Technologies, Inc. 发明人 Ram Tamir
分类号 G06F17/30;G06F11/14 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for determining a first data block size for deduplicating a file type, the method comprising: constructing a function relating a plurality of compression ratios to a plurality of test data block sizes, wherein a compression ratio of the plurality of compression ratios is calculated by transforming a file of the file type using a deduplication technology and a test data block size of the plurality of test data block sizes; determining a maximum compression ratio of the function; choosing a test data block size associated with the maximum compression ratio to be the first data block size for the file type; and deduplicating a data block of the first data block size based on a sliding window, wherein deduplicating the data block comprises: calculating a first hash value to identify a potential data block; andcalculating a second hash value to identify the duplicate data block,wherein a beginning of the sliding window is set at an end of a duplicate data block when the duplicate data block is detected, and wherein the sliding window moves backwards when the duplicate data block is not detected.
地址 Irvine CA US