发明名称 Unified recognition of speech and music
摘要 Methods, systems, and computer programs are presented for unified recognition of speech and music. One method includes an operation for starting an audio recognition mode by a computing device while receiving an audio stream. Segments of the audio stream are analyzed as the audio stream is received, where the analysis includes simultaneous checking for speech and music. Further, the method includes an operation for determining a first confidence score for speech and a second confidence score for music. As the audio stream is received, additional segments are analyzed until the end of the audio stream or until the first and second confidence scores indicate that the audio stream has been identified as speech or music. Further, results are presented on a display based on the identification of the audio stream, including text entered if the audio stream was speech or song information if the audio stream was music.
申请公布号 US9224385(B1) 申请公布日期 2015.12.29
申请号 US201313919170 申请日期 2013.06.17
申请人 GOOGLE INC. 发明人 Sharifi Matthew;Shahshahani Ben;Roblek Dominik
分类号 G10L15/00;G10L15/04 主分类号 G10L15/00
代理机构 Morris & Kamlay LLP 代理人 Morris & Kamlay LLP
主权项 1. A method for providing information to a user, the method comprising: detecting entry in an audio recognition mode by a computing device, the detecting including receiving an audio stream; analyzing, by a processor of the computing device, one or more segments of the audio stream received by the computing device before a complete audio stream is received, wherein analyzing includes: first checking the one or more segments to determine if the audio stream includes speech; andsecond checking the one or more segments to determine if the audio stream is from a song, wherein at least part of the first checking is performed while the second checking is being performed;determining a first confidence score from the first checking and determining a second confidence score from the second checking;displaying a possible candidate on a display based on a partial identification of the audio stream using the first and second confidence scores while continuing checking additional segments as the audio stream is received until an end of the audio stream or until the first and second confidence scores determine that the audio stream has been identified as speech or music; andpresenting results on the display based on the completed identification of the audio stream.
地址 Mountain View CA US