摘要 |
PROBLEM TO BE SOLVED: To compress an inverted index at high compressibility by an encoding method decodable in a high process speed to achieve high speed document search. SOLUTION: In compressing an identification number of a document to obtain a byte sequence by the variable byte method, w bits in the byte sequence are used to represent the number of occurrences of a corresponding indexing term in the document, and x bits are used to represent additional information of the posting. When the number of occurrences cannot be represented within w bits, a certain special value indicating a numeric value that cannot be represented by w bits is written in the byte sequence, and then is written by the variable byte method and postposed, where x and w are integers given as parameters. Additionally provided is a means for reading a compressed posting from any position midway of inverted lists, allowing a binary search on an inverted list. COPYRIGHT: (C)2008,JPO&INPIT |