摘要 |
Provided are a method and device for deleting duplicate data. The method comprises: dividing a file to be processed into at least two data blocks; calculating data fingerprints of each of the data blocks in the file to be processed; and according to the data fingerprints of each of the data blocks and data fingerprints in a hotspot Hash table, performing a deduplication operation on the data blocks of the file to be processed, wherein the data fingerprints in the hotspot Hash table are data fingerprints of which the number of times of duplicate occurrence reaches the set threshold value in at least one file. By performing a deduplication operation using a hotspot Hash table, the method and device for deleting duplicate data in the embodiments of the present invention reduce the repetition rate of the data blocks of the file, and improve the utilization rate of the storage space of the file. |