摘要 |
<p>A method of automatically identifying sentence boundaries in a document image without performing OCR. The identification process begins by selecting a connected component from the multiplicity of connected components of a text line. Next, it is determined whether the selected connected component might represent a period based upon its shape. If the selected connected component is dot shaped, then it is determined whether the selected connected component might represent a colon. Finally, if the selected connected component is dot shaped and not part of a colon, the selected connected component is labeled as a sentence boundary. <IMAGE></p> |