摘要 |
A method for compressing text, comprising the steps of splitting a main character string into component strings, and counting the frequency of occurrence of each component string in the main character string and ordering the component strings in their frequency of occurrence. The method also comprises a step of allocating to each component string a token value representative of the component string and determined by the frequency of occurrence of the component string, storing the token value so allocated as a token table in which tokens are associated with component strings, and allocating to each component string in the main character string the token value for that component string from the token table to generate a sequence of token values representing the main character string in a compressed format.
|