摘要 |
Disclosed is a system that converts a scanned image of a complex document, wherein each pixel is represented by a gray scale level, into a bi-level image where text has been preserved and separated from the background. The system subdivides the scanned image into cells, and then creates histograms of the gray scale levels of the pixels in the cells. It creates matrices of the runs of dark pixels within the cells, and examines the runs to determine the extent of connected components. It computes the percentage of runs of each length, and computes the average gray scale level of runs of each length for the document image. It determines peaks in each of the histograms, and determines the width of the first peak within each histogram. The system uses this information to set a gray scale level threshold used to create the bi-level image. |