摘要 |
A method is described for binarizing a gray scale document image, in particular, a document image containing both text and non-text contents. Phase congruency maps are calculated from the gray scale image, and used to segment the text and non-text areas of the gray scale image. The phase congruency maps are also used to extract long lines such as table lines, which can be optionally removed from the image. The text and non-text areas of the gray scale image are divided into image patches; for the text areas, connected components obtained from the phase congruency map are used to generate image patches, so that each image patch contains a text character. The image patches are binarized individually using individual threshold values, and then combined to generate a binary image of the gray scale image. The method can also be used for purposes of OCR or document authentication. |