发明名称 Storage system, storage controller, and method for eliminating data duplication based on hash table
摘要 According to one embodiment, a storage controller includes a dividing unit, a duplication manager, and a duplication determination unit. The dividing unit divides data specified in a write request from a host computer into a plurality of chunks. The duplication manager preferentially stores a first hash value of a first chunk in a first table in a hash table in association with the first chunk when the first chunk is written to a storage device. The hash table includes a second table having more entries than the first table. The duplication determination unit first searches the first table for a third hash value matching a second hash value of a second chunk when the second hash value has been calculated.
申请公布号 US9152341(B2) 申请公布日期 2015.10.06
申请号 US201313955626 申请日期 2013.07.31
申请人 KABUSHIKI KAISHA TOSHIBA;TOSHIBA SOLUTIONS CORPORATION 发明人 Yamazaki Shuji
分类号 G06F3/06 主分类号 G06F3/06
代理机构 Finnegan, Henderson, Farabow, Garrett & Dunner, LLP 代理人 Finnegan, Henderson, Farabow, Garrett & Dunner, LLP
主权项 1. A storage system comprising: a storage device; a storage controller configured to control access to the storage device; and a hash table including a first table and a second table, the first table having a first number of entries, and the second table having a second number of entries, the second number being larger than the first number; wherein the storage controller comprises: a dividing unit configured to divide data specified in a write request from a host computer into a plurality of chunks,a hash generation unit configured to calculate a hash value for each of the plurality of chunks based on the data in each of the plurality of chunks, the hash value having a first length,an access controller configured to write the chunks to the storage device,a duplication manager configured to preferentially store a first hash value of a first chunk in the first table in the hash table in association with the first chunk when the first chunk is written to the storage device, anda duplication determination unit configured to determine whether a third chunk is stored in the storage device by executing a process of searching the hash table for a third hash value matching a second hash value of a second chunk such that the first table is preferentially searched when the second hash value has been calculated, the third chunk having a content identical to a content of the second chunk; wherein: the duplication manager is further configured to inhibit the second chunk from being written to the storage device when the third chunk is determined to be stored in the storage device;the first table comprises a plurality of first pages used to store a plurality of hash values for a plurality of respective groups into which the plurality of hash values are classified based on the plurality of hash values;the plurality of first pages are associated with a respective plurality of group indices pointing to the plurality of groups;each of the plurality of hash values has the first length;each of the plurality of first pages has a third number of entries;a total number of entries in the plurality of first pages is equal to the first number;the second table comprises a plurality of second pages associated with the respective plurality of group indices;each of the plurality of second pages has a fourth number of entries, the fourth number being lamer than the third number; anda total number of the plurality of second pages is equal to the second number; the duplication determination unit is further configured to: identify a group to which the second hash value of the second chunk belongs based on the second hash value;preferentially select the first page associated with a group index of the identified group;search the selected first page for the third hash value;select the second page associated with the group index of the identified group when the search of the selected first page for the third hash value has failed and when the third number of entries in at least the selected first page are all in use; andsearch the selected second page for the third hash value; and the duplication manager is further configured to store the first hash value in a last selected page that is one of the first page and the second page when the third chunk is determined not to be stored in the storage device.
地址 Tokyo JP
您可能感兴趣的专利