发明名称 LANGUAGE MODELING OF COMPLETE LANGUAGE SEQUENCES
摘要 Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.
申请公布号 WO2014158239(A1) 申请公布日期 2014.10.02
申请号 WO2013US70732 申请日期 2013.11.19
申请人 GOOGLE INC. 发明人 CHELBA, CIPRIAN, I.;SAK, HASIM;SCHALKWYK, JOHAN
分类号 G10L15/06;G10L15/197 主分类号 G10L15/06
代理机构 代理人
主权项
地址