发明名称 System and method for handling multiple languages in text
摘要 A system and method for processing text are disclosed. The method includes receiving text to be processed. A main language of the text is identified. At least one unknown sequence in the text is identified, each unknown sequence comprising at least one word that is unknown in the main language. For a secondary language, for each of the at least one unknown sequence, the method includes determining whether the unknown sequence includes a first word recognized in the secondary language and, if so, identifying a sequence of words in the secondary language which includes at least the first word. The identifying of the sequence of words in the secondary language includes applying an algorithm for determining whether the sequence of words in the secondary language is expandable beyond the first word to include adjacent words. The text is labeled based on the identified sequences of words in the secondary language.
申请公布号 US8285541(B2) 申请公布日期 2012.10.09
申请号 US20100854543 申请日期 2010.08.11
申请人 BRUN CAROLINE;XEROX CORPORATION 发明人 BRUN CAROLINE
分类号 G06F17/20;G06F17/27;G06F17/28;G10L13/00;G10L15/00 主分类号 G06F17/20
代理机构 代理人
主权项
地址