发明名称 System and method for improving data compression of a storage system in an online manner
摘要 Techniques for improving data compression of a storage system in an online manner are described herein. According to one embodiment, in response to a sequence of data to be stored, the sequence of data is partitioned into a plurality of data chunks according to a predetermined chunking algorithm. A sketch for each of the data chunks is generated based on one or more features extracted from the data chunk. Each of the data chunks of the sequence of data is associated with one of a plurality of groups based on the sketch, wherein each group is represented by a sketch. The data chunks of each group are compressed and stored in a compression region of the storage systems, such that similar data chunks are compressed and stored in the same compression region.
申请公布号 US9514146(B1) 申请公布日期 2016.12.06
申请号 US201314038635 申请日期 2013.09.26
申请人 EMC Corporation 发明人 Wallace Grant;Douglis Frederick;Shilane Philip
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Blakely, Sokoloff, Taylor & Zafman LLP 代理人 Blakely, Sokoloff, Taylor & Zafman LLP
主权项 1. A computer-implemented method for improving data compression of data chunks of a storage system, the method comprising: in response to a sequence of data to be stored, partitioning the sequence of data into a plurality of data chunks according to a predetermined chunking algorithm; generating, by using a computer system, a sketch for each data chunk of the data chunks based on one or more features extracted from the data chunk; performing a lookup operation in a sketch index based on the sketch of the each data chunk to determine a compression region identifier (ID) of an existing compression region, wherein the sketch index maps a particular sketch to a particular compression region; associating said each data chunk of the data chunks of the sequence of data with one group of a plurality of groups based on the particular sketch, wherein each group of the plurality of groups is represented by a sketch, wherein the associating each data chunk of the data chunks of the sequence of data comprises merging the data chunk of the sequence of data with data chunks of an existing compression region identified by a corresponding compression region ID, wherein the merging comprises reorganizing the data chunks of the sequence of data and the data chunks of the existing compression region from an original sequence order to a second sequence order, similar data chunks being positioned adjacent to each other; and compressing and storing data chunks of the each group in a corresponding existing compression region of the storage system so that the similar data chunks are compressed and stored in the same compression region, wherein the compressing and storing data chunks of each group comprises writing data chunks of each group of the groups whose data chunks reach a predetermined threshold to a respective compression region and reclaiming a previous compression region space.
地址 Hopkinton MA US