发明名称 Method and apparatus for formatting OCR text
摘要 Following scanning of a document image, and optical character recognition (OCR) processing, the outputted OCR text is processed to determine a text format (typeface and font size) to match the OCR text to the originally scanned image. The text format is identified by matching word sizes rather than individual character sizes. In particular, for each word and for each of a plurality of candidate typefaces, a scaling factor is calculated to match a typeface rendering of the word to the width of the word in the originally scanned image. After all of the scaling factors have been calculated, a cluster analysis is performed to identify close clusters of scaling factors for a typeface, indicative of a good typeface fit at a constant scaling factor (font size).
申请公布号 US2002076111(A1) 申请公布日期 2002.06.20
申请号 US20000738320 申请日期 2000.12.18
申请人 XEROX CORPORATION 发明人 DANCE CHRISTOPHER R.;SEEGER MAURITIUS
分类号 G06K9/68;(IPC1-7):G06K9/72 主分类号 G06K9/68
代理机构 代理人
主权项
地址
您可能感兴趣的专利