摘要 |
Misclassified text components are identified and corrected by comparing non-text components with their neighboring text components. If a non-text component being examined is found to be substantially aligned with its neighboring text components, and is further found to have a similar average color and size as its neighboring text components, then it is reclassified as a text component. Misclassified non-text components are reduced by restricting text labeling to areas of a document image defined by an edge map. The edge map is made by smoothing the document image, and applying edge detection to the smooth image.
|