发明名称 Method and system for processing checksum of a data stream to optimize deduplication
摘要 Techniques for deduplicating a data stream with checksum data embedded therein are described. According to one embodiment, a first data stream is received from a client having a plurality of data regions and a plurality of checksums for verifying integrity of the data regions embedded therein, where the first data stream represents a file or a directory of one or more files of a file system associated with the client. In response the first data stream with the checksums removed is deduplicated into a plurality of deduplicated chunks.
申请公布号 US9063664(B1) 申请公布日期 2015.06.23
申请号 US201213718824 申请日期 2012.12.18
申请人 EMC Corporation 发明人 Li Junxu;Hsu Windsor W.
分类号 G06F17/00;G06F3/06;G06F17/30 主分类号 G06F17/00
代理机构 Blakely, Sokoloff, Taylor & Zafman LLP 代理人 Blakely, Sokoloff, Taylor & Zafman LLP
主权项 1. A computer-implemented method for deduplicating data, comprising: receiving at a storage system over a network from a client a first data stream having a plurality of data regions and a plurality of checksums for verifying integrity of the data regions embedded therein, the first data stream representing a file or a directory of one or more files of a file system associated with the client; scanning the first data stream to recognize a plurality of checksum markers that identify the checksums, wherein the checksum markers were inserted into the first data stream by the client prior to receiving the first data stream over the network; extracting the checksum markers and the checksums from the first data stream to generate second data stream without the checksum markers and associated checksum data therein; and deduplicating the second data stream into a plurality of deduplicated chunks.
地址 Hopkinton MA US