摘要 |
Described in detail herein are systems and methods for deduplicating data using byte-level or quasi byte-level techniques. In some embodiments, a file is divided into multiple blocks. A block includes multiple bytes. Multiple rolling hashes of the file are generated. For each byte in the file, a searchable data structure is accessed to determine if the data structure already includes an entry matching a hash of a minimum sequence length. If so, this indicates that the corresponding bytes are already stored. If one or more bytes in the file are already stored, then the one or more bytes in the file are replaced with a reference to the already stored bytes. The systems and methods described herein may be used for file systems, databases, storing backup data, or any other use case where it may be useful to reduce the amount of data being stored.
|