发明名称 HYBRID WAVEFORM-CODED AND PARAMETRIC-CODED SPEECH ENHANCEMENT
摘要 A method for hybrid speech enhancement which employs parametric-coded enhancement (or blend of parametric-coded and waveform-coded enhancement) under some signal conditions and waveform-coded enhancement (or a different blend of parametric-coded and waveform-coded enhancement) under other signal conditions. Other aspects are methods for generating a bitstream indicative of an audio program including speech and other content, such that hybrid speech enhancement can be performed on the program, a decoder including a buffer which stores at least one segment of an encoded audio bitstream generated by any embodiment of the inventive method, and a system or device (e.g., an encoder or decoder) configured (e.g., programmed) to perform any embodiment of the inventive method. At least some of speech enhancement operations are performed by a recipient audio decoder with Mid/Side speech enhancement metadata generated by an upstream audio encoder.
申请公布号 US2016225387(A1) 申请公布日期 2016.08.04
申请号 US201414914572 申请日期 2014.08.27
申请人 DOLBY LABORATORIES LICENSING CORPORATION ;DOLBY INTERNATIONAL AB 发明人 KOPPENS Jeroen;MUESCH Hannes
分类号 G10L21/0364;H04S3/00;G10L19/22;G10L21/0324;G10L19/008;G10L19/20 主分类号 G10L21/0364
代理机构 代理人
主权项 1. A method, comprising: receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content; transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel and a side-channel, wherein the mid-channel represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel represents a weighted or non-weighted difference of two channels of the reference audio channel representation; determining metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation; and generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation; wherein the method is performed by one or more computing devices.
地址 San Francisco CA US