发明名称 Augmenting speech segmentation and recognition using head-mounted vibration and/or motion sensors
摘要 Example methods and systems use multiple sensors to determine whether a speaker is speaking. Audio data in an audio-channel speech band detected by a microphone can be received. Vibration data in a vibration-channel speech band representative of vibrations detected by a sensor other than the microphone can be received. The microphone and the sensor can be associated with a head-mountable device (HMD). It is determined whether the audio data is causally related to the vibration data. If the audio data and the vibration data are causally related, an indication can be generated that the audio data contains HMD-wearer speech. Causally related audio and vibration data can be used to increase accuracy of text transcription of the HMD-wearer speech. If the audio data and the vibration data are not causally related, an indication can be generated that the audio data does not contain HMD-wearer speech.
申请公布号 US9135915(B1) 申请公布日期 2015.09.15
申请号 US201213559544 申请日期 2012.07.26
申请人 Google Inc. 发明人 Johnson Michael Patrick;Dong Jianchun;Balez Mat
分类号 G10L19/00;G10L21/00;G10L21/02;G10L15/00;G10L15/20;G10L25/00;H04R3/00;H04R25/00;G10L15/26 主分类号 G10L19/00
代理机构 McDonnell Boehnen Hulbert & Berghoff LLP 代理人 McDonnell Boehnen Hulbert & Berghoff LLP
主权项 1. A method, comprising: receiving audio data representative of audio detected by a microphone, wherein the microphone is positioned on a head-mountable device (HMD); determining whether the received audio data comprises audio speech data in an audio-channel speech band or audio non-speech data outside the audio-channel speech band; receiving vibration data representative of vibrations detected by a sensor other than the microphone, wherein the sensor is positioned on the HMD; determining a degree of spectral coherency, with respect to a threshold, between the audio data and the vibration data; determining whether or not the audio data is causally related to the vibration data based on the determined degree of spectral coherency; and if the received audio data both: (a) comprises audio speech data in an audio-channel speech band and (b) is determined to be causally related to the vibration data based on the degree of spectral coherency, then generating an indication that the audio data contains HMD-wearer speech and conditioning at least one of the audio data and the vibration data as speech data, wherein the conditioning comprises amplifying at least one of the audio data and the vibration data; if the received audio data both: (a) comprises audio non-speech data outside the audio-channel speech band and (b) is determined to be causally related to the vibration data based on the degree of spectral coherency, then conditioning at least one of the audio data and the vibration data as coherent non-speech data, wherein the conditioning comprises removing or replacing non-speech data from at least one of the audio data and the vibration data; and otherwise, determining that the received audio data and the vibration data are non-coherent and conditioning at least one of the audio data and the vibration data as non-speech data, wherein the conditioning comprises removing or replacing non-speech data from at least one of the audio data and the vibration data.
地址 Mountain View CA US