发明名称 METHOD AND SYSTEM FOR OPTICAL CHARACTER RECOGNITION USING IMAGE CLUSTERING
摘要 The present disclosure provides a computer-implemented method of translating an image-based electronic document into a text-based electronic document. The method includes electronically scanning an image-based document to determine positions of word images in the image-based document. The method also includes extracting the word images from the image-based document and storing the word images to an electronic storage device. The method also includes grouping a subset of the word images into a word cluster based on a similarity of the word images, wherein the word images in the word cluster correspond to a same actual word. The method also includes generating a character-encoded transcription for the word cluster based on the word images in the word cluster. The method also includes adding the character-encoded transcription to a text-based electronic document at locations corresponding to the positions of the word images in the image-based document.
申请公布号 US2012020561(A1) 申请公布日期 2012.01.26
申请号 US20100841839 申请日期 2010.07.22
申请人 ESHGHI KAVE;FORMAN GEORGE;REDDY PRAKASH 发明人 ESHGHI KAVE;FORMAN GEORGE;REDDY PRAKASH
分类号 G06K9/34 主分类号 G06K9/34
代理机构 代理人
主权项
地址