发明名称 Methods and System For Vectored Data De-Duplication
摘要 The present invention is directed toward methods and systems for data de-duplication. More particularly, in various embodiments, the present invention provides systems and methods for data de-duplication that may utilize a vectoring method for data de-duplication wherein a stream of data is divided into “data sets” or blocks. For each block, a code, such as a hash or cyclic redundancy code may be calculated and stored. The first block of the set may be written normally and its address and hash can be stored and noted. Subsequent block hashes may be compared with previously written block hashes.
申请公布号 US2015178308(A1) 申请公布日期 2015.06.25
申请号 US201514641525 申请日期 2015.03.09
申请人 SALIBA George;WHITE Theron 发明人 SALIBA George;WHITE Theron
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computerized method for vectored data de-duplication, comprising: comparing a de-duplication code for a first block of data to a de-duplication code for a previously processed block of data, where the first block of data was received in an input stream of computer-readable data; upon determining that the de-duplication code for the first block of data matches the de-duplication code for the previously processed block of data: identifying a number of copies of the first block of data that are available in an output stream of computer-readable data produced from the input stream;upon determining that the number of copies of the first block of data that are available in the output stream does not satisfy a pre-determined threshold number of blocks: storing the first block of data in the output stream;upon determining that the number of copies of the first block of data that are available in the output stream does satisfy the pre-determined threshold number of blocks: storing, in the output stream, in a location where the first block of data would have been placed if the number of copies of the first block of data satisfied the pre-determined threshold number of blocks, a vector that includes data for locating the previously processed block of data in the output stream, or a vector that includes data for locating a duplicate of the previously processed block of data in the output stream; where the input stream can be recreated from the output stream without reference to other de-duplication data structures, and where the output stream includes self-describing data.
地址 Boulder CO US
您可能感兴趣的专利