摘要 |
Apparatus and methods for producing segmentations of images which contain text. The general approach is to locate a first set of components which contain text characters, locate a second set of components which do not contain text characters, sort the second set using a characteristic shape, make a cover set from the sorted second set, and use the cover set to locate the portions of the image which contain text. An embodiment is disclosed in which the method is applied to texts employing the Manhattan layout to locate columns of text. In the embodiment, parts of the image which do not contain text are located by constructing maximum empty rectangles which do not contain characters. Based on the observation that columns in Manhattan layouts are separated by rectangles with a high aspect ratio, the maximum empty rectangles are sorted in a manner which favors such rectangles to produce the cover set. Also disclosed are methods for locating maximum empty rectangles defined by points and for locating maximum empty rectangles defined by rectangles containing characters.
|