摘要 |
An image processing method in which OCR is used to guide the text tokenization. More particularly, OCR is first performed on each symbol in the scanned image. For example, a symbol may be a number, letter, or other character. During the tokenization process, the OCR results are used to select appropriate matching criteria for each symbol. The symbols that are recognized as different characters are not allowed to be clustered into the same group. The symbols with the same OCR results are clustered according to the recognition confidence levels.
|