摘要 |
Systems and methods are disclosed for performing garbage collection to identify content segments no longer referenced in a deduplicating storage system in which redundant mark operations in a mark-and-sweep technique are avoided. An organized arrangement of hash structures is created for each data object, wherein each structure includes a hash signature for a corresponding content segment and is associated with a reference to the corresponding content segment, and the logical organization of the arrangement represents the logical organization of the content segments as they are represented within the data object. Additionally, for each data object, temporal states are maintained over time. Garbage collection iterates over the temporal structures and, for each temporal structure, marks the garbage collection state for the associated content segments for only the content segments that have changed relative to an immediately prior temporal state of the data object.
|