发明名称 USING INDEX PARTITIONING AND RECONCILIATION FOR DATA DEDUPLICATION
摘要 The subject disclosure is directed towards a data deduplication technology in which a hash index service's index is partitioned into subspace indexes, with less than the entire hash index service's index cached to save memory. The subspace index is accessed to determine whether a data chunk already exists or needs to be indexed and stored. The index may be divided into subspaces based on criteria associated with the data to index, such as file type, data type, time of last usage, and so on. Also described is subspace reconciliation, in which duplicate entries in subspaces are detected so as to remove entries and chunks from the deduplication system. Subspace reconciliation may be performed at off-peak time, when more system resources are available, and may be interrupted if resources are needed. Subspaces to reconcile may be based on similarity, including via similarity of signatures that each compactly represents the subspace's hashes.
申请公布号 US2016012098(A1) 申请公布日期 2016.01.14
申请号 US201514797890 申请日期 2015.07.13
申请人 MICROSOFT TECHNOLOGY LICENSING, LLC 发明人 Li Jin;Sengupta Sudipta;Kalach Ran;Desai Ronakkumar N.;Oltean Paul Adrian;Benton James Robert
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. An apparatus, comprising: a processor; and logic operative on the processor to load a subspace index comprising less than all index entries of an index service from a secondary media into a primary memory cache in which the subspace index corresponds to a set of subspaces of a global index and reconcile at least two subspaces to remove at least one duplicate chunk by using a resemblance metric to compare the at least two subspaces.
地址 Redmond WA US