发明名称 Reducing activation of similarity search in a data deduplication system
摘要 For conditional activation of similarity search in a data deduplication system using a processor device in a computing environment, input data is partitioned into data chunks. A determination is made as to whether to apply the similarity search process for an input data chunk based on deduplication results of a previous input data chunk in the input data.
申请公布号 US9594766(B2) 申请公布日期 2017.03.14
申请号 US201313941703 申请日期 2013.07.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Aronovich Lior
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Griffiths & Seaton PLLC 代理人 Griffiths & Seaton PLLC
主权项 1. A system for conditional activation of a similarity search in a data deduplication system of a computing environment, the system comprising: the data deduplication system; a repository operating in the data deduplication system; a memory in the data deduplication system; a data structure in association with the memory in the data deduplication system; and at least one processor device operating in the computing storage environment for controlling the data deduplication system, wherein the at least one processor device: partitions an input data stream of input data into data chunks, the data chunks having a size of at least 16 Megabytes (MB),determines when and when not to apply the similarity search for an input data chunk based on deduplication results of a previous input data chunk in the input data stream, andapplies the similarity search if the deduplication result of the previous input data chunk in the input data stream is one of below a predetermined deduplication result threshold and does not exist, thereby only calculating rolling hash values of the input data chunk when needed to be used in the similarity search in the data deduplication system of the computing environment.
地址 Armonk NY US