发明名称 Method and system for dynamic compression module selection
摘要 A computer-implemented method for compressing a data set, the method comprising receiving a first data block of the data set, selecting automatically by a compression management module a compression module from a plurality of compression modules to apply to the first data block based on projected compression efficacy or resource utilization, and compressing the first data block with the selected compression module to generate a first compressed data block.
申请公布号 US9571698(B1) 申请公布日期 2017.02.14
申请号 US201213436680 申请日期 2012.03.30
申请人 EMC IP Holding Company LLC 发明人 Wallace Grant R.;Shilane Philip N.;Douglis Frederick;Luo Jianqiang
分类号 G06K9/36;H04N1/41;H04N19/10;H04N21/235 主分类号 G06K9/36
代理机构 Blakely, Sokoloff, Taylor & Zafman LLP 代理人 Blakely, Sokoloff, Taylor & Zafman LLP
主权项 1. A computer-implemented method for compressing each block in a data set that comprises a plurality of data blocks, the method comprising: deduplicating a data set received by a backup storage management server from a client device across a network, the data set comprising a plurality of data blocks; receiving an uncompressed first data block of the deduplicated data set by a compression management module of the backup storage management server; pre-processing a sample data block of the data set to determine projected compression efficacy for the first data block, wherein the pre-processing comprises determining a count of repeated byte sequences within the sample data block; selecting automatically by the compression management module a compression module from a plurality of compression modules to apply to the sample data block based on the determined count of repeated byte sequences within the sample data block, wherein automatically selecting a compression module comprises: selecting a fast compression module from the plurality of compression modules as a default compression module;applying the fast compression module to a sample portion of the sample data block, generating a compressed sample;determining a compression rate for the compressed sample;in response to determining that the compression rate for the compressed sample generated using the fast compression module is less than or equal to ten percent, selecting the fast compression module for compressing the first data block,in response to determining that the compression rate for the compressed sample generated using the fast compression module is greater than ninety percent reduction of data in the sample portion, selecting the fast compression module for compressing the first data block,in response to determining that the compression rate for the compressed sample generated using the fast compression module is greater than ten percent and less than or equal to ninety percent reduction of data, individually applying each of the remaining plurality of compression modules to the sample portion of the sample data block, generating a plurality of compressed samples, and selecting the compression module whose compressed sample in the plurality of compressed samples has a most efficient compression among the plurality of compressed samples;analyzing resources utilized to compress the sample portion of the sample data block;in response to determining that the amount of resources used to compress the sample portion of the sample data block exceeds a resource utilization constraint, selecting the fast compression module, otherwise selecting a slow compression module; compressing the first data block with the selected compression module to generate a first compressed data block; and storing the first compressed data block in a storage device.
地址 Hopkinton MA US