摘要 |
An interface is disclosed that makes information obtained from a file deduplication process available to an application for the efficient operation thereof. A data deduplication repository is scanned to determine a plurality of file segments and respective checksum values associated with the segments. A data structure is generated that allows shared segments to be identified by indexing using a common checksum value. The segments also indicate the file to which they belong and may also include a timestamp value. This data structure is updated as files are modified, etc. The data structure is accessible to an application program so that the application program can readily determine which segments are shared between multiple files. With this information, the application can efficiently process the segment once rather than multiple times. Timestamps can be used by the application to efficiently identify only those segments that were accessed after a given time. |