摘要 |
<p>A document retrieval apparatus for retrieving a document including a query character string among a plurality of registered documents includes a text separating unit which separates the registered documents and a query character string into n-grams and words, an n-gram index which stores therein information about occurrences of n-grams appearing in the registered documents on a n-gram-specific basis, a word-boundary-position index which stores therein information about occurrences of word boundaries appearing in the registered documents in a compressed form, a character-string-based search unit which identifies one or more registered documents including the query character string by looking up one or more n-grams of the query character string in the n-gram index, and a word-based search unit which checks whether the query character string appears as word in the one or more identified registered documents by looking up one or more words of the query character string in the word-boundary-position index, thereby identifying a registered document including the query character string as word.</p> |