发明名称 Efficient identification and correction of optical character recognition errors through learning in a multi-engine environment
摘要 OCR errors are identified and corrected through learning. An error probability estimator is trained using ground truths to learn error probability estimation. Multiple OCR engines process a text image, and convert it into texts. The error probability estimator compares the outcomes of the multiple OCR engines for mismatches, and determines an error probability for each of the mismatches. If the error probability of a mismatch exceeds an error probability threshold, a suspect is generated and grouped together with similar suspects in a cluster. A question for the cluster is generated and rendered to a human operator for answering. The answer from the human operator is then applied to all suspects in the cluster to correct OCR errors in the resulting text. The answer is also used to further train the error probability estimator.
申请公布号 US8331739(B1) 申请公布日期 2012.12.11
申请号 US20090357367 申请日期 2009.01.21
申请人 ABDULKADER AHMAD;CASEY MATTHEW R.;GOOGLE INC. 发明人 ABDULKADER AHMAD;CASEY MATTHEW R.
分类号 G06K9/00;G06F7/00;G06F15/00;G06F17/00;G06F17/20;G06F17/21;G06F17/22;G06F17/24;G06F17/25;G06F17/26;G06F17/27;G06F17/28;G06F17/30;G06K1/00;G06K7/10;G06K9/03;G06K9/34;G06K9/62;G06K15/02;H04N1/40 主分类号 G06K9/00
代理机构 代理人
主权项
地址