发明名称 DYNAMIC SELECTION AMONG ACOUSTIC TRANSFORMS
摘要 Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data.
申请公布号 US2015149167(A1) 申请公布日期 2015.05.28
申请号 US201113249509 申请日期 2011.09.30
申请人 Beaufays Françoise;Schalkwyk Johan;Vanhoucke Vincent Olivier;Aleksic Petar Stanisa 发明人 Beaufays Françoise;Schalkwyk Johan;Vanhoucke Vincent Olivier;Aleksic Petar Stanisa
分类号 G10L15/26;G10L25/54;G10L15/197;G10L25/27;G10L15/20 主分类号 G10L15/26
代理机构 代理人
主权项 1. A method comprising: receiving speech data from a user device; receiving an indication of the user device; executing a speech recognition algorithm that selectively retrieves, from one or more storage devices, a plurality of pre-stored user and acoustic condition specific transforms based on the received indication of the user device, and that utilizes the received speech data as an input into pre-stored mathematical models of the retrieved plurality of pre-stored user and acoustic condition specific transforms to convert the received speech data into one or more word strings that each represent at least a portion of the received speech data, wherein each one of the plurality of pre-stored user and acoustic condition specific transforms is a transform that is both specific to the user device and specific to one acoustic condition from among a plurality of different acoustic conditions, wherein each of the different acoustic conditions comprises a context in which the speech data could have been provided, and wherein each of the plurality of pre-stored user and acoustic condition specific transforms and each of the pre-stored mathematical models that are utilized to convert the received speech data into the one or more word strings were generated and stored in the one or more storage devices prior to receipt of the speech data from the user device and prior to receipt of the indication of the user device; estimating which word string of the one or more word strings more accurately represents the received speech data; selecting, based on the estimation and from the plurality of user and acoustic condition specific transforms, an appropriate user and acoustic condition specific transform for conversion of the speech data into the word string estimated to more accurately represent the received speech data; and transmitting the word string to at least one of the user device or one or more servers.
地址 Mountain View CA US