发明名称 Managing deduplication of stored data
摘要 In one aspect, in general, a method for managing data in a data storage system comprises receiving data to be stored in the data storage system, computing values corresponding to different respective portions of the received data, generating identifiers corresponding to different respective portions of the received data, with an identifier corresponding to a particular portion of data including the computed value corresponding to the particular portion of data and metadata indicating a location where the particular portion of data is being stored in the data storage system, and storing at least some of the identifiers in an index until the index reaches a predetermined size.
申请公布号 US8898107(B1) 申请公布日期 2014.11.25
申请号 US201313887558 申请日期 2013.05.06
申请人 Permabit Technology Corp. 发明人 Floyd Jered J.;Fortson Michael;Westerlund Assar;Coburn Jonathan
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A method for managing data in a data storage system, the method comprising: receiving, at a data deduplication engine associated with the data storage system, data to be stored in the data storage system; and providing, by the data deduplication engine, deduplication advice to a software layer based on real-time analysis of the received data by: computing values corresponding to different respective portions of the received data;generating identifiers corresponding to different respective portions of the received data, with an identifier corresponding to a particular portion of data including the computed value corresponding to the particular portion of data and accompanying metadata associated with the particular portion of data;storing at least some of the identifiers in an index of a predetermined size; andin response to determining that a first identifier corresponding to a first portion of the received data was not already stored in the index before the first portion of data was received, indicating, as the deduplication advice provided before the first portion of the received data is stored in the data storage system, that the first identifier may be stored in the index and the first portion of the received data may be stored in the data storage system; and designating one or more identifiers for removal from the index, at least some of the identifiers being among those that have been least recently added to or updated in the index.
地址 Cambridge MA US