发明名称 PREDICTIVE PROBABILISTIC DEDUPLICATION OF STORAGE
摘要 Examples perform predictive probabilistic deduplication of storage, such as virtualized or physical disks. Incoming input/output (I/O) commands include data, which is written to storage and tracked in a key-value store. The key-value store includes a hash of the data as the key, and a reference counter and the address of the data as the value. When a certain percentage of sampled incoming data is found to be duplicate, it is predicted that the I/O commands have become not unique (e.g., duplicate). Based on the prediction, subsequent incoming data is not written to storage, and instead the reference counter associated with the hash of the data is incremented. In this manner, predictions on the uniqueness of future data is made based on previous data, and extraneous writes and deletions from the chunk store are avoided.
申请公布号 US2016350324(A1) 申请公布日期 2016.12.01
申请号 US201514726597 申请日期 2015.05.31
申请人 VMware, Inc. 发明人 WANG Wenguang;LUO Tian
分类号 G06F17/30;G06N7/00 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for probability-based deduplication of storage, said method comprising: receiving, by a processor, a plurality of input/output (I/O) commands, said plurality of commands including content subdivided into a first plurality of data blocks; writing the blocks to storage; sampling the first plurality of the blocks and updating a key-value table with the sampled blocks; predicting, by the processor, whether a second plurality of blocks are expected to be unique or duplicate based on the sampling; and upon predicting that the second plurality of blocks is duplicate: updating the key-value table with the duplicate blocks;tallying unique blocks;writing unique blocks to storage; andupon the tally of unique blocks exceeding a threshold, predicting that a next plurality of blocks is expected to be unique; and upon predicting that the second plurality of blocks is unique: writing the blocks to storage;continuing to sample the blocks and update the key-value table with the sampled blocks; andpredicting that a next plurality of blocks is expected to be unique or duplicate based on the sampling.
地址 Palo Alto CA US