发明名称 Method and system for determining data integrity for garbage collection of data storage systems
摘要 A garbage collector of a storage system traverses a namespace of a file system of the storage system to verify data integrity of segments. The namespace identifies files that are represented by segments arranged in multiple levels in a hierarchy, where an upper level segment includes one or more references to one or more lower level segments, and at least one segment is referenced by multiple files. Traversing the namespace includes computing and verifying checksums all segments in a level-by-level manner, where checksums of an upper level are verified before any of checksums of a lower level are verified. Upon all checksums of all levels have been verified, a garbage collection process is performed on the segments stored in the storage system.
申请公布号 US9367448(B1) 申请公布日期 2016.06.14
申请号 US201313909875 申请日期 2013.06.04
申请人 EMC Corporation 发明人 Botelho Fabiano C.;Moghe Dheer;Pang Hung Hing (Anthony);Vale Ferreira Menezes Guilherme
分类号 G06F17/30;G06F12/02 主分类号 G06F17/30
代理机构 Blakely, Sokoloff, Taylor & Zafman LLP 代理人 Blakely, Sokoloff, Taylor & Zafman LLP
主权项 1. A computer-implemented method of verifying data integrity for garbage collection, the method comprising: traversing, by a garbage collector executed by a processor, a namespace of a file system of a storage system to verify data integrity of segments, the namespace identifying a plurality of files that are represented by a plurality of segments arranged in a plurality of levels in a hierarchy, wherein an upper level segment includes one or more references to one or more lower level segments, wherein at least one segment is referenced by multiple files, wherein traversing the namespace includes verifying data integrity for all segments in a level-by-level manner comprising: verifying data integrity from checksums for an upper level segment, andverifying, in response to data integrity verification of the upper level segment, data integrity of a lower level segment from a lower level segment checksum,for each of the levels in the hierarchy, iteratively performing the following: obtaining fingerprints of all segments of a current level,computing checksums from the fingerprints of all segments of the current level, andadding the checksums of the current level to a parent checksum of the current level,computing checksums from the fingerprints of the current level that are read from the storage device,adding the checksums of the current level to a child checksum of the current level,for each of the fingerprints of the current level, marking a corresponding bit of a walk vector to indicate that a corresponding segment has been processed, wherein the walk vector includes a plurality of bits, each bit corresponding to one of the segments in the namespace,for each of the segments of the current level, retrieving a fingerprint of the segment from its storage location of a storage device, and marking a corresponding bit of a read vector to indicate that the segment has been read from its storage location; and upon verifying data integrity for the plurality of levels, performing a garbage collection process to reclaim storage from segments not referenced by a file in the storage system.
地址 Hopkinton MA US