摘要 |
<p>Exemplary method, system, and computer program product embodiments for scalable data deduplication working with small data chunk in a computing environment are provided. In one embodiment, by way of example only, for each small data chunk, a signature is generated based on a combination of a representation of characters used in selecting data to be deduplicated. A c-spectrum of the small data chunk being a sequence of representations of different characters ordered by a frequency of occurrence in the small data chunk, and an f-spectrum of the small data chunk being a corresponding sequence of frequencies of the different characters in the small data chunk.</p> |