发明名称 Delta Compression Engine for Similarity Based Data Deduplication
摘要 The present disclosure relates to systems and methods for similarity based data deduplications. The system may be realized as a delta compression engine using pipelining and parallel data lookup techniques across multiple hardware modules including a block sketch computation module, a reference block indexing module, and a similar block delta compression module. The system implements a method for delta compression including identifying an incoming data block among multiple reference data blocks in a reference dictionary to determine a near duplicate reference data block. The method may include looking up the incoming data block in a table built upon the reference data blocks. The method may further include representing the incoming data block in a final storage format as indices and lengths of the identified data equivalence in the corresponding reference data blocks.
申请公布号 US2017038978(A1) 申请公布日期 2017.02.09
申请号 US201615214243 申请日期 2016.07.19
申请人 HGST Netherlands B.V. 发明人 Li Dongyang;Wang Qingbo;Bandic Zvonimir Z.;Yang Ken Qing;Narasimha Ashwin
分类号 G06F3/06 主分类号 G06F3/06
代理机构 代理人
主权项 1. A system comprising: a block signature module configured to determine a signature sketch of a new data block based on a fingerprint computation; a reference block index module communicatively coupled to the block signature module, the reference block index module configured to: receive, from the block signature module, the signature sketch of the new data block;compute a new hash key of the signature sketch of the new data block;search a hash index table using the new hash key to find a reference hash index record including a reference hash key similar to the new hash key;search a reference list table, using the reference hash index record, to determine a signature sketch of a related reference data block stored in the reference list table;retrieve, from the reference list table, the related reference data block corresponding to the signature sketch of the related reference data block responsive to determining that a similarity between the signature sketch of the new data block and the signature sketch of the related reference data block exceeds a threshold; a delta encoding module communicatively coupled to the reference block index module, the delta encoding module configured to: scan the related reference data block and the new data block to determine a match between one or more data elements of the related reference data block and one or more data elements of the new data block; andto encode the one or more data elements of the new data block using the match to produce a compressed delta.
地址 Amsterdam NL