DOCUMENT IMAGE PROCESSOR,申请号JP20070055542-传众专利搜索

发明名称	DOCUMENT IMAGE PROCESSOR
摘要	PROBLEM TO BE SOLVED: To obtain a correct retrieval result by eliminating false recognition by an OCR in a document image processor. SOLUTION: A rectangular information character sequence effective for retrieval is extracted from a document image. By using a correspondence table between the rectangular information character sequence and a text character sequence and a frequency table, the rectangular information character sequence is converted into the text character sequence. By performing a morphological analysis of the text character sequence, word division is carried out, and the character sequence is converted into the rectangle information character sequence with tags of function words and content words (nouns and adjectives). A group of words with the content word tag is made a group of retrieval word candidates. A retrieval word is selected with a value of frequency in a document and document frequency as reference, and the document image is retrieved. Thus, a related document image can be more reliably retrieved. COPYRIGHT: (C)2008,JPO&INPIT
申请公布号	JP2008217546(A)	申请公布日期	2008.09.18
申请号	JP20070055542	申请日期	2007.03.06
申请人	RICOH CO LTD	发明人	GOTO ATSUYUKI
分类号	G06F17/30;G06K9/00;G06T1/00	主分类号	G06F17/30
代理机构		代理人
主权项
地址