摘要 |
PROBLEM TO BE SOLVED: To provide a key word extracting method which guarantees that an extracted key word has only an error within a permissible range even if a text converted by an OCR(optical character recognition) has an error. SOLUTION: A plaintext and accuracy file generation part 6a extracts a character code included in character information as a 1st candidate and accuracy information from candidates for character information included in an OCR result file 5 generated through the character recognition of an OCR and generates a plaintext 6c and an accuracy file 6b. A key word extraction unit 6e generates a key word list 6g by taking a morpheme analysis of the obtained plaintext and key word extraction. A key word verification part 6f judges whether characters are misrecognized, one by one, from key words of high order in the obtained key word list 6g according to a previously set threshold value and excludes key words judged to be larger in the rate of the number of misrecognized characters than a specific condition from the key word list 6g.
|