POST-OCR IMAGE SEGMENTATION INTO SPATIALLY SEPARATED TEXT ZONES
摘要
<p>This invention describes a post-recognition procedure to group text recognized by an Optical Character Reader (OCR) from a document image into zones. Once the recognized text and the corresponding word bounding boxes for each word of the text are received, the procedure described dilates (expands) these word bounding boxes by a factor and records those which cross. Two word bounding boxes will cross upon dilation if the corresponding words are very close to each other on the original document. The text is then grouped into zones using the rule that two words will belong to the same zone if their word bounding boxes cross upon dilation. The text zones thus identified are sorted and returned.</p>
申请公布号
WO2007022460(A2)
申请公布日期
2007.02.22
申请号
WO2006US32483
申请日期
2006.08.18
申请人
DIGITAL BUSINESS PROCESSES, INC.;ROMANOFF, HARRIS;SPERO, LESLIE;SINGH, SARABJIT