发明名称 Merging three optical character recognition outputs for improved precision using a minimum edit distance function
摘要 Three OCR systems are employed for text conversion and the results generated from each of the three are merged using a edit distance algorithm to estimate a correct common text ancestor. To make the process computationally feasible for large strings such as pages of documentation with 3,000 characters, the method is executed in two stages. The first procedure is carried out with each page considered as a string of lines. Where differences exist using the edit distance between the lines on a page to find the optimal alignment of the lines. In the event that choice must be made among three non-null lines, the procedure then is invoked on the three lines , by using the edit distance between the characters on a line to find the optimal alignment. The number of computations required of the procedure is further reduced by comer-cutting that hueristically determines an upper bound on the edit distance and limits calculations to those which do not exceed the upper bound.
申请公布号 US5459739(A) 申请公布日期 1995.10.17
申请号 US19920853550 申请日期 1992.03.18
申请人 OCLC ONLINE COMPUTER LIBRARY CENTER, INCORPORATED 发明人 HANDLEY, JOHN C.;HICKEY, THOMAS B.
分类号 G06K9/62;G06K9/68;(IPC1-7):G06F11/18;G06F17/16 主分类号 G06K9/62
代理机构 代理人
主权项
地址