发明名称 Techniques for using a bloom filter in a duplication operation
摘要 Techniques for using a bloom filter in deduplication are described herein. A change log comprising a plurality of data blocks may be received. Values associated with the data blocks may be hashed and compared with a bloom filter. The comparison with the bloom filter identifies data blocks from the change log as unique data blocks or potential duplicate data blocks. A bit by bit comparison of the potential duplicate data blocks and previous data blocks may be performed to determine if any of the potential duplicate data blocks are identical to any of previous data blocks. Such data blocks of the change log that are identified as being identical may be deduplicated.
申请公布号 US9298726(B1) 申请公布日期 2016.03.29
申请号 US201213632892 申请日期 2012.10.01
申请人 NetApp, Inc. 发明人 Mondal Shishir;Killamsetti Praveen
分类号 G06F7/00;G06F17/00;G06F17/30 主分类号 G06F7/00
代理机构 Gilliam IP PLLC 代理人 Gilliam IP PLLC
主权项 1. A device comprising: a processor; a computer readable medium having instructions stored thereon, the instructions comprising instructions which, when executed by the processor, cause the device to: determine a first data block identified in a change log, wherein the first data block is associated with a content identifier;generate a plurality of hashes based, at least in part, on the content identifier, wherein each of the hashes is generated by a different hash function, wherein each of the plurality of hashes identifies one of a plurality of entries of a first bloom filter;determine whether the content identifier has been previously received based, at least in part, on the plurality of hashes and the first bloom filter;in response to a determination that the content identifier has not been previously received, update the first bloom filter to indicate that the content identifier was received; andin response to a determination that the content identifier may have been previously received, update a second bloom filter to indicate that the content identifier was received a second time; andindicate, in a fingerprint database, that the first data block is a potential duplicate.
地址 Sunnyvale CA US