Automatic language identification system for multilingual optical character recognition,申请号US19970929788-传众专利搜索

发明名称	Automatic language identification system for multilingual optical character recognition
摘要	The disclosed invention utilizes a dictionary-based approach to identify languages within different zones in a multi-lingual document. As a first step, a document image is segmented into various zones, regions and word tokens, using suitable geometric properties. Within each zone, the word tokens are compared to dictionaries associated with various candidate languages, and the language that exhibits the highest confidence factor is initially identified as the language of the zone. Subsequently, each zone is further split into regions. The language for each region is then identified, using the confidence factors for the words of that region. For any language determination having a low confidence value, the previously determined language of the zone is employed to assist the identification process.
申请公布号	US6047251(A)	申请公布日期	2000.04.04
申请号	US19970929788	申请日期	1997.09.15
申请人	CAERE CORPORATION	发明人	PON, LEONARD K.;KANUNGO, TAPAS;YANG, JUN;CHOY, KENNETH CHAN;BOKSER, MINDY R.
分类号	G06K9/68;(IPC1-7):G06F17/28;G06K9/72	主分类号	G06K9/68
代理机构		代理人
主权项
地址