摘要 |
A system or method for identifying text in a document. A group of connected components is created. A plurality of characteristics of different types is calculated for each connected component. Statistics are computed which describe the group of characteristics. Outlier components are identified as connected components whose computed characteristics are outside a statistical range. The outlier components are removed from the group of connected components. Text pixels are identified by segmenting pixels in the group of connected components into a group of text pixels and a group of background pixels.
|