摘要 |
PROBLEM TO BE SOLVED: To correctly extract character areas from a document image even if characters belonging to different character areas are close to each other. SOLUTION: The document image processing method which corrects excessive integration of extracted character areas when performing character recognition and layout information acquisition as to character areas by extracting the character areas from a document image includes a step S201 for inputting the document image, a step S202 for reducing the inputted document image, extracting the circumscribed rectangles of connecting components of black pixels constituting characters, and extracting basic elements, a step S203 for classifying the basic elements by characters, tables, figures, and others, generating lines by integrating the character elements, and extracting the character areas by integrating the lines, a step S204 for extracting column setting information from the character areas, and a step S205 for correcting over-integrated character areas by referring to the positions of extracted columns.
|