发明名称 Age-out selection in hash caches
摘要 A backup client de-duplicates backup data sets using a locally stored, memory resonant, root tag vector and hash cache. To create a new backup data set, the client queries a backup server to determine which of the root hashes in the root tag vector are available on the backup server. If one or more are no longer available, the backup server re-uses a root tag vector entry corresponding to one of the no longer available root hashes. If all are available, the client ages out a root hash for re-use based on a combination of age and represented size. Data is de-duplicated by chunking and hashing it and comparing the resulting hashes to hashes in the hash cache. To prevent the hash cache from growing too large, entries in the hash cache are aged out based on a combination of age and size of data represented by the entries.
申请公布号 US8825971(B1) 申请公布日期 2014.09.02
申请号 US200711967871 申请日期 2007.12.31
申请人 EMC Corporation 发明人 Auchmoody Scott C.;Ogata Scott
分类号 G06F12/00;G06F13/00;G06F13/28 主分类号 G06F12/00
代理机构 Workman Nydegger 代理人 Workman Nydegger
主权项 1. On a backup client that de-duplicates backup data using a root tag vector and a hash cache prior to sending the backup data to a backup server, a method of aging root hashes out of the root tag vector to de-duplicate data, the method comprising: caching a root tag vector on the backup client in a hash cache of the backup client, wherein the root tag vector includes a plurality of entries corresponding to a plurality of previous backup data sets stored by the backup server, each entry including a root hash representative of a corresponding backup data set, a date the corresponding backup data set was created, and a size of the corresponding backup data set; requesting the backup server that stores and ages out previous backup data sets to identify which of the previous backup data sets are still on the backup server; receiving a response from the backup server identifying one or more of the previous backup data sets that are still on the backup server; if the response from the backup server identifies all of the previous backup data sets as still on the backup server, selecting for re-use an entry j in the root tag vector based on a combination of the age and the represented byte size of the entry j; de-duplicating a new backup data set at the backup client before sending the new backup data set to the backup server by eliminating redundant data from the new backup data set using the hash cache to identify the redundant data in the new backup data set; sending the new backup data set to the backup server; and maintaining the hash cache that includes hash entries and a tag field for each hash entry, the tag field for a given hash entry indicating which root hashes the hash entry is protected by, wherein an old root hash Rj is the root hash aged out of the root tag vector when entry j is selected for re-use, the method further comprising, modifying a tag bit Tj for each hash entry protected by old root hash Rj to indicate that the hash entries are no longer protected by old root hash Rj.
地址 Hopkinton MA US