主权项 |
1. A computer-implemented method of verifying data integrity for garbage collection, the method comprising:
traversing, by a garbage collector executed by a processor, a namespace of a file system of a storage system to verify data integrity of segments, the namespace identifying a plurality of files that are represented by a plurality of segments arranged in a plurality of levels in a hierarchy, wherein an upper level segment includes one or more references to one or more lower level segments, wherein at least one segment is referenced by multiple files, wherein traversing the namespace includes verifying data integrity for all segments in a level-by-level manner comprising:
verifying data integrity from checksums for an upper level segment, andverifying, in response to data integrity verification of the upper level segment, data integrity of a lower level segment from a lower level segment checksum,for each of the levels in the hierarchy, iteratively performing the following:
obtaining fingerprints of all segments of a current level,computing checksums from the fingerprints of all segments of the current level, andadding the checksums of the current level to a parent checksum of the current level,computing checksums from the fingerprints of the current level that are read from the storage device,adding the checksums of the current level to a child checksum of the current level,for each of the fingerprints of the current level, marking a corresponding bit of a walk vector to indicate that a corresponding segment has been processed, wherein the walk vector includes a plurality of bits, each bit corresponding to one of the segments in the namespace,for each of the segments of the current level, retrieving a fingerprint of the segment from its storage location of a storage device, and marking a corresponding bit of a read vector to indicate that the segment has been read from its storage location; and upon verifying data integrity for the plurality of levels, performing a garbage collection process to reclaim storage from segments not referenced by a file in the storage system. |