发明名称 Method and apparatus for identifying and eliminating duplicate data blocks and sharing data blocks in a storage system
摘要 A method for sharing data blocks in a hierarchical file system in a storage server includes allocating a plurality of data blocks in the file system, and sharing data blocks in the file system, without using a persistent point-in-time image, to avoid duplication of data blocks. A method for identifying data blocks that can be shared includes computing a fingerprint for each of multiple data blocks to be written to a storage facility and storing the fingerprint with information identifying the data block in an entry in a set of metadata. The set of metadata is used to identify data blocks which are duplicates.
申请公布号 US8849767(B1) 申请公布日期 2014.09.30
申请号 US200511105895 申请日期 2005.04.13
申请人 NetApp, Inc. 发明人 Zheng Ling;Lewis Blake H.;Ting Daniel W.;English Robert M.;Manley Stephen L.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 DeLizio Gilliam, PLLC 代理人 DeLizio Gilliam, PLLC
主权项 1. A computer-implemented method comprising: implementing a hierarchical file system in a storage server, wherein user data is stored in or retrieved from the file system; allocating a plurality of data blocks in the file system; maintaining a plurality of pointers in the hierarchical file system, wherein each of the plurality of pointers references a data block to include the data block as part of a file; including, in a file, a data block of the plurality of data blocks by referencing the data block using a pointer associated with the file, wherein a share flag indicates whether the file is permitted to include a data block referenced by more than one of the plurality of pointers; sharing the data block with a different file by referencing the data block using a pointer associated with the different file to avoid duplication of the data block, wherein sharing the data block eliminates a duplicate of the data block by incrementing a reference count for the data block itself and decrementing a reference count for each duplicate of the data block, wherein the reference counts specify a number of references to the corresponding data block by the plurality of pointers, and wherein the reference counts for the data block and each duplicate of the data block are separate reference counts; and determining, by checking the share flag of a file to modify, whether reading of a reference count from a reference count file can be bypassed for the file to modify, wherein the share flag indicates whether the file to modify contains a shared data block.
地址 Sunnyvale CA US