摘要 |
<p>Method and apparatus for selecting text and/or non-text blocks in a stored document includes functions and structure for identifying connected pixel components in a stored document, separating the identified pixel components into text and non-text components, searching the document for visible and invisible lines along edges of the non-text components, forming irregularly-shaped text and non-text blocks using the identified text components and the visible and invisible lines, detecting the text orientation for each formed text block, extracting text lines from the text block based on the detected orientation, detecting the skew angle for the stored document based on the extracted lines, and modifying the formed text and non-text blocks based on the detected skew angle. The thus-formed text blocks are preferably subjected to character recognition routines. <IMAGE></p> |