发明名称 DATA SHREDDING FOR SPEECH RECOGNITION LANGUAGE MODEL TRAINING UNDER DATA RETENTION RESTRICTIONS
摘要 Training speech recognizers, e.g., their language or acoustic models, using actual user data is useful, but retaining personally identifiable information may be restricted in certain environments due to regulations. Accordingly, a method or system is provided for enabling training of a language model which includes producing segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state. The method further includes enabling a system to train a language model using the segments of text in the depersonalized state and the counts. Because the data is depersonalized, actual data may be used, enabling speech recognizers to keep up-to-date with user trends in speech and usage, among other benefits.
申请公布号 US2014278425(A1) 申请公布日期 2014.09.18
申请号 US201313800738 申请日期 2013.03.13
申请人 NUANCE COMMUNICATIONS, INC. 发明人 Jost Uwe Helmut;Woodland Philip Charles;Katz Marcel;Shahid Syed Raza;Vozila Paul J.;Ganong, III William F.
分类号 G10L15/06 主分类号 G10L15/06
代理机构 代理人
主权项 1. A method of enabling training of a language model, the method comprising: producing segments of text in a text corpus and counts corresponding to the segments of text, the text corpus being in a depersonalized state; and enabling a system to train a language model using the segments of text in the depersonalized state and the counts.
地址 Burlington MA US