发明名称 Text language identification
摘要 After prestoring first character strings that occur frequently in words of languages and second character strings that are a typical therein, a device for automatically identifying the language of a text from a plurality of languages extracts words from the text and constructs all of the character strings contained in each extracted word. Each string in an extracted word is compared to the first and second strings of a particular language. If the word contains a first string, a score of the language is increased by a coefficient depending in particular on the position of the first string in the word. If the word contains a second string, the score is decreased by a coefficient associated with the second string. The highest of the scores corresponding to the predetermined languages identifies the language of the text.
申请公布号 US7689409(B2) 申请公布日期 2010.03.30
申请号 US20030732809 申请日期 2003.12.11
申请人 FRANCE TELECOM 发明人 HEINECKE JOHANNES
分类号 G06F17/20;G06F17/27;G06F17/28;G06K9/72 主分类号 G06F17/20
代理机构 代理人
主权项
地址