发明名称 Garbage collection of chunks
摘要 Various techniques are provided for performing garbage collection in a chunk store that is being used to implement a hierarchical file system. In general, the techniques involve a “trace” phase in which all chunks that correspond to current versions of files are marked, and then a sweep phase in which all chunks that were not marked during the trace phase are reclaimed. Various techniques are also described for using snapshots to avoid the need to halt operations on the file system while the trace phase is being performed. In addition, techniques are provided for using a cache of last-touched timestamps to avoid the need to mark all current chunks in each trace phase.
申请公布号 US9176871(B1) 申请公布日期 2015.11.03
申请号 US201313760945 申请日期 2013.02.06
申请人 upthere, inc. 发明人 Serlet Bertrand
分类号 G06F12/02;G06F12/12 主分类号 G06F12/02
代理机构 Hickman Palermo Becker Bingham LLP 代理人 Hickman Palermo Becker Bingham LLP
主权项 1. A method comprising: implementing a hierarchical file system that supports modifications using chunks stored in a content addressable storage system, wherein at least a portion of the chunks represents file content; wherein making a modification in the hierarchical file system comprises storing one or more new chunks in the content addressable storage system that represent the hierarchical file system structure after the modification; performing a garbage collection operation that reclaims stale chunks, within the content addressable storage system, that no longer reflect a current state of the hierarchical file system as of a first particular time due to one or more modifications in the hierarchical file system; wherein performing the garbage collection operation includes: during a trace phase of the garbage collection operation, performing one or more recursive touch-traversal operations to establish new last-touched timestamps for chunks visited during the recursive touch-traversal operations;wherein each of the one or more recursive touch-traversal operations is initiated at a distinct chunk that, as of a second particular time that is later than the first particular time, was a current root of a corresponding tree within the hierarchical file system;during the trace phase, determining whether to initiate a particular recursive touch-traversal operation at a particular chunk, which was the current root of a particular tree within the hierarchical file system as of the second particular time, based, at least in part, on a sub-tree-timestamp maintained for the particular tree;responsive to the sub-tree-timestamp of the particular tree being before the first particular time, initiating the particular recursive touch-traversal operation at the particular chunk;responsive to the sub-tree-timestamp of the particular tree being after the first particular time, performing the trace phase without initiating any recursive touch-traversal operation at the particular chunk;during a sweep phase, reclaiming chunks, within the content addressable storage system, that have last-touched timestamps that are before the first particular time; wherein the method is performed by one or more computing devices.
地址 Palo Alto CA US