摘要 |
A method for determining the boundaries of text or character strings represented in an array of image data by shape, without a requirement for individually detecting and/or identifying the character or characters making up the strings. The method relies upon the detection of connected components within words to first determine text line boundaries and to isolate the connected components into text rows. Subsequently, the structural relationships between the components within and defining rows (i.e. overlap, inter-character spacing, and inter-word spacing), are used to further combine adjacent sets of connected components into words or similar units of semantic understanding within text rows.
|