摘要 |
<p>Method and apparatus for separating touching characters within an optical character recognition (OCR) computer (1). An input document (20) is scanned by scanner (2), forming a set of scan lines (3). A segmentation process (4) is performed on the scan lines (3) to create a set of segmented image boxes (5). Candidate characters within the image boxes (5) are classified by a classification module (6), based upon a library of stored models (7). When the candidate characters have high degree of confidence, they are classified and coded into a binary form (8), such as ASCII. Those candidate characters that are not classified are processed by a touching character decision module (9) to determine whether a series of separation modules (10-14) is to be invoked. The execution of modules (10-13), followed by the reexecution of modules (4) and (6), may or may not cause all of the touching characters to be separated. Any touching characters that remain are subjected to one or more reprocessing cycles. The reprocessing can entail examination (14) of adjacent scan lines (3), shifting of separation threshold T by separation threshold determination module (10), or re-execution of deconvolution step (12) with changed parameters or structure. <IMAGE></p> |