摘要 |
Mechanisms are provided for efficiently detecting segments for deduplication. Data is analyzed to determine file types and file components. File types such as images may have optimal data segment boundaries set at the file boundaries. Other file types such as container files are delayered to extract objects to set optimal data segment boundaries based on file type or based on the boundaries of the individual objects. Storage of unnecessary information is minimized in a deduplication dictionary while allowing for effective deduplication. |