LANGUAGE IDENTIFICATION IN MULTILINGUAL TEXT,申请号WO2011US52133-传众专利搜索

发明名称	LANGUAGE IDENTIFICATION IN MULTILINGUAL TEXT
摘要	Methods, systems, and media are provided for identifying languages in multilingual text. A document is decoded into a universal representative coding for easier tag manipulation, then broken into plain-text content sections. The sections are identified and assigned a weight, wherein more informative sections are given a higher weight and less informative sections are given a lesser weight. A language likelihood score is determined for each word, phrase, or character n-gram in a section. The language likelihood scores within a section are combined for each language. The combined section scores are then summed together to obtain a total document score for each language. This results in a document score for each language, which can be ranked to determine the primary language for the document.
申请公布号	WO2012050743(A3)	申请公布日期	2012.06.21
申请号	WO2011US52133	申请日期	2011.09.19
申请人	MICROSOFT CORPORATION	发明人	LI, KANG;KLODER, STEPHEN ALLEN;JOHNSON, IAN GEORGE;ALONICHAU, SIARHEI
分类号	G06F17/21;G06F9/44;G06F17/28	主分类号	G06F17/21
代理机构		代理人
主权项
地址