摘要 |
A method and apparatus for achieving relatively low compression ratios based on the realization of using a longer history and longer common strings of the input data stream as an initial evaluation of the input data prior to applyi ng a particular compression process. More particularly, the input data is preprocessed by applying string-matching to the extract long common strings. The input data is divided into a series of blocks with each individual block having a uniform size, illustratively, 1000 characters in length. Further, a so-called fingerprint is computed and stored for each block. Thereafter, the input data stream is traversed an d comparison is made between a particular set of character of the input stream and the computed fingerprints. In particular, the input stream is traversed as a function of a sliding window wherein the present window of characters of the input is compared to the computed fingerprints. Upon detecting a match, the input stream is encoded with an identifier determined as function of the detected match. Thereafter, a compression of the preprocessed and encoded input stream is made, illustratively, using Lempel-Ziv compression.
|