发明名称 OCR-GUIDED TEXT TOKENIZATION OF DIGITAL IMAGES
摘要 An image processing method in which OCR is used to guide the text tokenization. More particularly, OCR is first performed on each symbol in the scanned image. For example, a symbol may be a number, letter, or other character. During the tokenization process, the OCR results are used to select appropriate matching criteria for each symbol. The symbols that are recognized as different characters are not allowed to be clustered into the same group. The symbols with the same OCR results are clustered according to the recognition confidence levels.
申请公布号 US2010150460(A1) 申请公布日期 2010.06.17
申请号 US20080335624 申请日期 2008.12.16
申请人 XEROX CORPORATION 发明人 FAN ZHIGANG;TSE FRANCIS;CAMPANELLI MICHAEL R.;BAI YINGJUN
分类号 G06K9/36;G06K9/20 主分类号 G06K9/36
代理机构 代理人
主权项
地址