发明名称 DOCUMENT IMAGE PROCESSOR
摘要 PROBLEM TO BE SOLVED: To obtain a correct retrieval result by eliminating false recognition by an OCR in a document image processor. SOLUTION: A rectangular information character sequence effective for retrieval is extracted from a document image. By using a correspondence table between the rectangular information character sequence and a text character sequence and a frequency table, the rectangular information character sequence is converted into the text character sequence. By performing a morphological analysis of the text character sequence, word division is carried out, and the character sequence is converted into the rectangle information character sequence with tags of function words and content words (nouns and adjectives). A group of words with the content word tag is made a group of retrieval word candidates. A retrieval word is selected with a value of frequency in a document and document frequency as reference, and the document image is retrieved. Thus, a related document image can be more reliably retrieved. COPYRIGHT: (C)2008,JPO&INPIT
申请公布号 JP2008217546(A) 申请公布日期 2008.09.18
申请号 JP20070055542 申请日期 2007.03.06
申请人 RICOH CO LTD 发明人 GOTO ATSUYUKI
分类号 G06F17/30;G06K9/00;G06T1/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址