MINING NEW WORDS FROM A QUERY LOG FOR INPUT METHOD EDITORS,申请号WO2009IB08016-传众专利搜索

发明名称	MINING NEW WORDS FROM A QUERY LOG FOR INPUT METHOD EDITORS
摘要	Described is a technology in which new words (including a phrase or set of Chinese characters) are mined from a query log. The new words may be added to (or otherwise supplement) an IME dictionary. A set of candidate queries may be selected from the log based upon market (e.g., the Chinese market) and/or by language. From this set, various filtering steps are performed to locate only new words that are frequently in used. For example, only frequent queries are kept for further processing, which may include filtering out queries based on length (e.g., less than two or greater than eight Chinese characters), and/or filtering out queries based on too many stop-words in the query. Processing may also include filtering out a query that is a substring of a larger query, or vice-versa. Also described is Pinyin-based clustering and filtering, and filtering out queries already handled in the dictionary.
申请公布号	WO2010043984(A2)	申请公布日期	2010.04.22
申请号	WO2009IB08016	申请日期	2009.10.04
申请人	MICROSOFT CORPORATION	发明人	CHEN, WEIZHU;LI, QIAN XUN;JU, LI;CHEN, ZHENG;LI, DONG;FAN, ZHIKAI
分类号		主分类号
代理机构		代理人
主权项
地址