发明名称 Method for removing duplicate data from a storage array
摘要 A system and method for efficiently removing duplicate data blocks at a fine-granularity from a storage array. A data storage subsystem supports multiple deduplication tables. Table entries in one deduplication table have the highest associated probability of being deduplicated. Table entries may move from one deduplication table to another as the probabilities change. Additionally, a table entry may be evicted from all deduplication tables if a corresponding estimated probability falls below a given threshold. The probabilities are based on attributes associated with a data component and attributes associated with a virtual address corresponding to a received storage access request. A strategy for searches of the multiple deduplication tables may also be determined by the attributes associated with a given storage access request.
申请公布号 US8930307(B2) 申请公布日期 2015.01.06
申请号 US201113250570 申请日期 2011.09.30
申请人 PURE Storage, Inc. 发明人 Colgrove John;Hayes John;Miller Ethan;Hasbani Joseph S.;Sandvig Cary
分类号 G06F7/00;G06F17/00;G06F17/30;G06F3/06 主分类号 G06F7/00
代理机构 Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C. 代理人 Rankin Rory D.;Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
主权项 1. A computer system comprising: a non-transitory data storage medium; a first fingerprint table comprising a first plurality of entries and a second fingerprint table comprising a second plurality of entries, wherein each entry of the first and the second plurality of entries is configured to store a fingerprint corresponding to a data component already stored in the system, wherein the first fingerprint table has fewer entries than the second fingerprint table and wherein the second fingerprint table comprises a fingerprint for at least one deduplicated data components not included in the first fingerprint table; and a data storage controller comprising hardware; wherein in response to receiving a write request, the data storage controller is configured to: search the first fingerprint table during inline deduplication prior to the second fingerprint table based on a first fingerprint corresponding to the write request;in response to detecting a hit on a matching entry in the first fingerprint table during said search: write a reference to the data corresponding to the matching entry in the first table; andin response to detecting a miss in the first fingerprint table during said search: postpone further deduplication to offline deduplication; andwrite data corresponding to the write request in the data storage medium.
地址 Mountain View CA US