发明名称 METHOD, SYSTEM, AND COMPUTER-READABLE RECORDING MEDIUM FOR RECOGNIZING CHARACTERS INCLUDED IN A DOCUMENT BY USING LANGUAGE MODEL AND OCR
摘要 PURPOSE: A method, a system, and a computer-readable recording medium for recognizing characters included in a document by using language model and an OCR are provided to judges an image/noise region mis-classified into a text region by referring to the location information of character inputted to an OCR device. CONSTITUTION: A first OCR(Optical Character Recognition) unit(130) recognizes a text string included in a text section by using a first OCR, and a second OCR(140) recognizes the text string including an mage/noise section. A documents structure analysis unit(150) analyzes the document structure to find out the text string including a certain region mis-classified through a language model. Based on the location information for the region obtained from the first OCR, the region is re-classified into an image/noise section.
申请公布号 KR20100044668(A) 申请公布日期 2010.04.30
申请号 KR20080103890 申请日期 2008.10.22
申请人 NHN CORPORATION 发明人 YANG, BYOUNG SEOK;SEO, HEE CHEOL;YOON, BYOUNG HOON;SUNG PAUL KIJOON;LEE, DO GIL
分类号 G06F17/26;G06F17/21;G06F17/27 主分类号 G06F17/26
代理机构 代理人
主权项
地址