摘要 |
PROBLEM TO BE SOLVED: To provide a method and a system for recognizing characters included in a document comprising an image area and a text area, and a computer readable recording medium. SOLUTION: The character string recognition method includes: (a) a step for analyzing a document structure of a document and classifying it into a text area and an image/noise area; (b) a step for using a first OCR (Optical Character Recognition) to recognize a character string included in the text area; (c) a step for detecting a character string included in a specific area falsely classified into the text area by a language model, referring to position information about the specific area obtained by the first OCR, and reclassifying the specific area into the image/noise area; and (d) a step for recognizing a character string included in the image/noise area by using a second OCR with respect to the image/noise area classified in the steps (a) to (c). COPYRIGHT: (C)2010,JPO&INPIT |