发明名称 |
METHOD AND SYSTEM FOR OPTICAL CHARACTER RECOGNITION USING IMAGE CLUSTERING |
摘要 |
The present disclosure provides a computer-implemented method of translating an image-based electronic document into a text-based electronic document. The method includes electronically scanning an image-based document to determine positions of word images in the image-based document. The method also includes extracting the word images from the image-based document and storing the word images to an electronic storage device. The method also includes grouping a subset of the word images into a word cluster based on a similarity of the word images, wherein the word images in the word cluster correspond to a same actual word. The method also includes generating a character-encoded transcription for the word cluster based on the word images in the word cluster. The method also includes adding the character-encoded transcription to a text-based electronic document at locations corresponding to the positions of the word images in the image-based document. |
申请公布号 |
US2012020561(A1) |
申请公布日期 |
2012.01.26 |
申请号 |
US20100841839 |
申请日期 |
2010.07.22 |
申请人 |
ESHGHI KAVE;FORMAN GEORGE;REDDY PRAKASH |
发明人 |
ESHGHI KAVE;FORMAN GEORGE;REDDY PRAKASH |
分类号 |
G06K9/34 |
主分类号 |
G06K9/34 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|