摘要 |
A data de-duplication system includes a data manager (12, fig. 1) for storing data blocks (40) in a memory 14 and an index for identifying which data blocks are stored in the memory (16) e.g. a sparse bitmap. The index includes chunks (22) e.g. of bits r corresponding to hash values of given data blocks, and a chunk allocation record (30) is provided which has a record of entries (32). Each record entry being associated with a range of data values that are associated with the data blocks, and being configurable to identify a respective chunk in respect of a received data block. The data manager refers to the record entry associated with the range of values in which the data value for the data block falls, and in the event that the record entry does not identify any of the chunks, selects one of the chunks, configures the record entry to identify the selected chunk, configures the selected chunk to identify the data value for the data block, and stores the data block in the memory. In the event that the record entry does identify one of the chunks, the data manager determines if the identified chunk is configured to identify the data value, and upon determining that the identified chunk does not identify the data value, configures the identified chunk to identify the data value for the data block, and stores the data block in the memory, and upon determining that said identified chunk does identify the data value, does not store the data block.
|