发明名称 KEY WORD EXTRACTING METHOD FOR IMAGE DOCUMENT
摘要 PROBLEM TO BE SOLVED: To provide a key word extracting method which guarantees that an extracted key word has only an error within a permissible range even if a text converted by an OCR(optical character recognition) has an error. SOLUTION: A plaintext and accuracy file generation part 6a extracts a character code included in character information as a 1st candidate and accuracy information from candidates for character information included in an OCR result file 5 generated through the character recognition of an OCR and generates a plaintext 6c and an accuracy file 6b. A key word extraction unit 6e generates a key word list 6g by taking a morpheme analysis of the obtained plaintext and key word extraction. A key word verification part 6f judges whether characters are misrecognized, one by one, from key words of high order in the obtained key word list 6g according to a previously set threshold value and excludes key words judged to be larger in the rate of the number of misrecognized characters than a specific condition from the key word list 6g.
申请公布号 JP2001022773(A) 申请公布日期 2001.01.26
申请号 JP19990194211 申请日期 1999.07.08
申请人 RICOH CO LTD 发明人 GOTO ATSUYUKI
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址