发明名称 CHINESE, JAPANESE, OR KOREAN LANGUAGE DETECTION
摘要 Disclosed are systems, computer-readable mediums, and methods for determining a text contains Chinese, Japanese, or Korean characters. A document image is received and binarized. The binarized document image is searched for connected components. A plurality of fragments is identified based on the connected components. A language hypothesis for each fragment of the plurality of fragments is determined. The language hypothesis has a probability rating. A subset of fragments from the plurality of fragments having the highest probability ratings is selected. The language hypothesis of each fragment in the subset of fragments is verified. A determination of the presence of Chinese, Japanese, or Korean characters is made based at least on the verification of the language hypothesis of the subset of fragments.
申请公布号 US2015178559(A1) 申请公布日期 2015.06.25
申请号 US201414561851 申请日期 2014.12.05
申请人 ABBYY Development LLC 发明人 Yurievich Atroshchenko Mikhail;Deryagin Dmitry Georgievich;Chulinin Yuri Georgievich
分类号 G06K9/00;G06K9/32;G06F17/27 主分类号 G06K9/00
代理机构 代理人
主权项 1. A method for determining a text contains Chinese, Japanese, or Korean characters, the method comprising: receiving a document image; binarizing the document image; searching for connected components in the binarized document image; identifying a plurality of fragments based on the connected components; determining a language hypothesis for each fragment of the plurality of fragments, wherein the language hypothesis has a probability rating; selecting a subset of fragments from the plurality of fragments having highest probability ratings; verifying, using a processor, the language hypothesis of each fragment in the subset of fragments; and determining, using the processor, that Chinese, Japanese, or Korean (CJK) characters are present in the received document image based at least on the verification of the language hypothesis of the subset of fragments.
地址 Moscow RU