发明名称 Reference signal suppression in speech recognition
摘要 The technology described herein can be embodied in a method that includes receiving a first signal representing an output of a speaker device, and a second signal comprising the output of the speaker device, and an audio signal corresponding to an utterance of a speaker. The method includes aligning one or more segments of the first signal with one or more segments of the second signal. Acoustic features of the one or more segments of the first and second signals are classified to obtain a first set of vectors and a second set of vectors, respectively, the vectors being associated with speech units. The second set is modified using the first set, such that the modified second set represents a suppression of the output of the speaker device in the second signal. A transcription of the utterance of the speaker can be generated from the modified second set of vectors.
申请公布号 US9240183(B2) 申请公布日期 2016.01.19
申请号 US201414181374 申请日期 2014.02.14
申请人 Google Inc. 发明人 Sharifi Matthew;Roblek Dominik
分类号 G10L15/20;G10L15/00;G10L15/06;G10L15/22;G10L15/26;G10L21/0208;G10L13/00;G10L15/02;G10L25/24 主分类号 G10L15/20
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A computer implemented method comprising: receiving, at a processing system, a first signal representing an output of a speaker device; receiving, at the processing system, a second signal comprising (i) the output of the speaker device and (ii) an audio signal corresponding to an utterance of a speaker; aligning, by the processing system, one or more segments of the first signal with one or more segments of the second signal; classifying acoustic features of the one or more segments of the first signal to obtain a first set of vectors associated with speech units, wherein each vector in the first set of vectors comprises a phoneme and an associated weight; classifying acoustic features of the one or more segments of the second signal to obtain a second set of vectors associated with speech units, wherein each vector in the second set of vectors comprises a phoneme and an associated weight; modifying the second set of vectors using the first set of vectors to obtain a modified second set of vectors, wherein modifying the second set comprises: identifying one or more speech units that are present in the first set of vectors and the second set of vectors,adjusting the weights associated with the identified speech units in the second set of vectors to indicate that the identified speech units are candidates for being suppressed from the modified second set of vectors, andsuppressing the identified speech units from the modified second set of vectors based at least in part on the adjusted weights, wherein the modified second set of vectors represents a suppression of the output of the speaker device in the second signal; and providing the modified second set of vectors to generate a transcription of the utterance of the speaker.
地址 Mountain View CA US