摘要 |
A method is claimed for processing a vector-raster image file which contains a text image. The method comprises the steps of: fragmenting the image to obtain regions containing non-separable, logically connected fragments of text of the maximum possible size; processing text, vector, and raster objects; discarding excessive information; analyzing each object with the help of all available information. The step of processing text objects includes the steps of: dividing into separate characters and character groups according to supposed locations of blank spaces or other non-indicated symbols, and analyzing and assembling character groups into words and verifying and correcting characters encoding based on recognition of assembled words as raster objects. The step of processing vector objects includes the step of identifying separators, background, and substrates of blocks. The step of processing raster objects includes the steps of: analyzing non-text objects on order to detect text images within them, and/or detecting vector objects other than separators.
|