发明名称 |
METHOD, SYSTEM, AND COMPUTER-READABLE RECORDING MEDIUM FOR RECOGNIZING CHARACTERS INCLUDED IN A DOCUMENT BY USING LANGUAGE MODEL AND OCR |
摘要 |
PURPOSE: A method, a system, and a computer-readable recording medium for recognizing characters included in a document by using language model and an OCR are provided to judges an image/noise region mis-classified into a text region by referring to the location information of character inputted to an OCR device. CONSTITUTION: A first OCR(Optical Character Recognition) unit(130) recognizes a text string included in a text section by using a first OCR, and a second OCR(140) recognizes the text string including an mage/noise section. A documents structure analysis unit(150) analyzes the document structure to find out the text string including a certain region mis-classified through a language model. Based on the location information for the region obtained from the first OCR, the region is re-classified into an image/noise section.
|
申请公布号 |
KR20100044668(A) |
申请公布日期 |
2010.04.30 |
申请号 |
KR20080103890 |
申请日期 |
2008.10.22 |
申请人 |
NHN CORPORATION |
发明人 |
YANG, BYOUNG SEOK;SEO, HEE CHEOL;YOON, BYOUNG HOON;SUNG PAUL KIJOON;LEE, DO GIL |
分类号 |
G06F17/26;G06F17/21;G06F17/27 |
主分类号 |
G06F17/26 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|