摘要 |
PROBLEM TO BE SOLVED: To easily and fast retrieve a document including a designated character string from a registered document group. SOLUTION: This document retrieving method comprises a text dividing means for disassembling text being a registered document or a retrieval character string into n-grams (n character set) and words, an n-gram index for holding appearance information about n-grams in the registered document in each n- gram, a word boundary index for holding appearance information about a word boundary in the registered document, a character string unit retrieving means for retrieving a document including the retrieval character string or an appearance position in the document by referring to the n-gram index on the basis of results obtained by dividing the retrieval character string to the n-grams, and a word unit retrieving means for deciding whether the retrieval character string appears as a word by referring to the word boundary index on the basis of results obtained by dividing the retrieval character string into words with respect to results of the character string unit retrieving means and retrieving a document including the retrieval character string as a word.
|