发明名称 System and method for creating deduplicated copies of data by tracking temporal relationships among copies using higher-level hash structures
摘要 Systems and methods are disclosed for forming deduplicated images of a data object that changes over time using difference information between temporal states of the data object. The method includes organizing the content of the data object for a first temporal state as a plurality of content segments and storing the content segments in a data store; creating an organized arrangement of hash structures to represent the data object in its first temporal state; receiving difference information for the data object; forming at least one hash signature for the changed content; and storing the changed content that is unique in the data store as content segments. The method also includes determining, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store.
申请公布号 US9384207(B2) 申请公布日期 2016.07.05
申请号 US201514627880 申请日期 2015.02.20
申请人 ACTIFIO, INC. 发明人 Provenzano Christopher A.;Roman Mark A.
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 Wilmer Cutler Pickering Hale and Dorr LLP 代理人 Wilmer Cutler Pickering Hale and Dorr LLP
主权项 1. A computing system for storing deduplicated images of a data object that changes over time in a deduplicating content store, the deduplicating content store having a local cache and a global cache, the computing system comprising: a processor; and a memory coupled to the processor and including computer-readable instructions that, when executed by the processor, cause the processor to: organize the content of the data object for a first temporal state of the data object as a plurality of content segments and storing the plurality of content segments in a data store;create a content structure representing content of the data object as a hierarchical arrangement of hash structures in the data store, wherein each hash structure includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and wherein a higher-level hash structure in the hierarchical arrangement aggregates a set of lower-level hash structures, such that a logical organization of the content structure represents the organization of the content segments as they are represented within the data object;receive difference information for the data object, said difference information indicating changed content for the data object for a second temporal state of the data object relative to the first temporal state, and said difference information indicating a location of the changed content within the data object;receive the changed content for the data object at the deduplicating content store;form a hash signature for each of a set of changed lower-level hash structures associated with the changed content;form a hash signature for a changed higher-level hash structure aggregating a plurality of the set of changed lower-level hash structures;determine, subsequent to receiving the changed content at the deduplicating content store, whether the changed content should be stored by searching for the hash signature for the changed higher-level hash structure in the global cache of the deduplicating content store before attempting to search for the hash signatures for each of the set of changed lower-level hash structures;store any changed content that is unique in the data store as content segments;modify the organized arrangement of hash structures to incorporate new structures for the content segment corresponding to at least one hash signature for the changed content; andincorporate the new structures in the organized arrangement of structures at a position corresponding to the location of the changed content within the data object as indicated within said difference information, thereby using the higher-level hash signature for the changed content without unnecessary searching for hash signatures for the lower-level hash structures.
地址 Waltham MA US