发明名称 RE-DIGITIZATION AND ERROR CORRECTION OF ELECTRONIC DOCUMENTS
摘要 A system and method to error correct extant electronic documents is disclosed. An electronic document may be rasterized to obtain a pixel representation of the electronic document (e.g., raster image). One or more optical character recognition (OCR) tasks may be performed on the raster image of the electronic document. Errors discovered by the OCR tasks may be corrected and a customized error corrected version of the electronic document may be created and stored. If the author of the electronic document is known, the raster image may be compared to a personalized tf*idf error dictionary associated with the author to determine known OCR errors specific to the author. The raster image may also be compared to a personalized electronic error dictionary associated with the author to determine known typographical errors specific to the author.
申请公布号 EP2845147(A1) 申请公布日期 2015.03.11
申请号 EP20120875858 申请日期 2012.04.29
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. 发明人 SIMSKE, STEVEN J.;LIU, SAMSON J.
分类号 G06K9/00;G06K9/03;G06K9/18 主分类号 G06K9/00
代理机构 代理人
主权项
地址