发明名称 Classification of audio as speech or non-speech using multiple threshold values
摘要 A portion of an audio signal is separated into multiple frames from which one or more different features are extracted. These different features are used, in combination with a set of rules, to classify the portion of the audio signal into one of multiple different classifications (for example, speech, non-speech, music, environment sound, silence, etc.). In one embodiment, these different features include one or more of line spectrum pairs (LSPs), a noise frame ratio, periodicity of particular bands, spectrum flux features, and energy distribution in one or more of the bands. The line spectrum pairs are also optionally used to segment the audio signal, identifying audio classification changes as well as speaker changes when the audio signal is speech.
申请公布号 US7249015(B2) 申请公布日期 2007.07.24
申请号 US20060276419 申请日期 2006.02.28
申请人 MICROSOFT CORPORATION 发明人 JIANG HAO;ZHANG HONG-JIANG
分类号 G10L19/12;G10L11/00 主分类号 G10L19/12
代理机构 代理人
主权项
地址