摘要 |
PROBLEM TO BE SOLVED: To provide a content-based segmentation scheme for data compression. SOLUTION: The encoding includes steps of: determining a target segment size; determining a window size; identifying a fingerprint within a window of symbols at an offset in the input data; determining whether the offset is to be designated as a cut point; and segmenting the input data as indicated by the set of cut points. For each segment so identified, the encoder determines whether the segment is to be a referenced segment or an unreferenced segment, replaces the segment data of each referenced segment with a reference label, and stores a reference binding in a persistent segment store for each referenced segment as necessary. COPYRIGHT: (C)2009,JPO&INPIT
|