摘要 |
PROBLEM TO BE SOLVED: To increase the accuracy of document retrieval based on natural language. SOLUTION: A document retrieval apparatus for retrieving from a predetermined corpus a document file whose content is related to search text is disclosed. The apparatus retains index information indicating each gram's position within a document and position within a morpheme. The document search apparatus accepts the input of the search text from a user and extracts morphemes and grams. The apparatus then indexes the rarity of each morpheme in the corpus according to an estimated number, detects a document file containing the morpheme, and counts the number of times that such a morpheme appears in the document file as the frequency of appearance. Based on the estimated number and the frequency of appearance of the morpheme, the apparatus indexes the association between the contents of the search text and the document file as association scores. COPYRIGHT: (C)2008,JPO&INPIT
|