摘要 |
A computer implemented method for generating a compressed index of information. The information is stored as a plurality of records in a database. Indexable portions of information are sequentially parsed to generate words and metawords. The words represent the portions, and the metawords represent attributes of the portions. A location is sequentially assigned to each word and metaword in the order that the portions are parsed to form pairs. The pairs are sorted first according to the words and metawords, and second according to the locations. Index entries are written to a memory for each unique word and metaword. Each index entry includes a word entry or a metaword entry, and one or more location entries. The word and metaword entries use a prefix encoding which indicates the number of bytes that the unique word or metaword of a next index entry has in common with the unique word or metaword of a previous index entry. The location entries use a delta value encoding.
|