发明名称 |
HYBRID WAVEFORM-CODED AND PARAMETRIC-CODED SPEECH ENHANCEMENT |
摘要 |
A method for hybrid speech enhancement which employs parametric-coded enhancement (or blend of parametric-coded and waveform-coded enhancement) under some signal conditions and waveform-coded enhancement (or a different blend of parametric-coded and waveform-coded enhancement) under other signal conditions. Other aspects are methods for generating a bitstream indicative of an audio program including speech and other content, such that hybrid speech enhancement can be performed on the program, a decoder including a buffer which stores at least one segment of an encoded audio bitstream generated by any embodiment of the inventive method, and a system or device (e.g., an encoder or decoder) configured (e.g., programmed) to perform any embodiment of the inventive method. At least some of speech enhancement operations are performed by a recipient audio decoder with Mid/Side speech enhancement metadata generated by an upstream audio encoder. |
申请公布号 |
US2016225387(A1) |
申请公布日期 |
2016.08.04 |
申请号 |
US201414914572 |
申请日期 |
2014.08.27 |
申请人 |
DOLBY LABORATORIES LICENSING CORPORATION ;DOLBY INTERNATIONAL AB |
发明人 |
KOPPENS Jeroen;MUESCH Hannes |
分类号 |
G10L21/0364;H04S3/00;G10L19/22;G10L21/0324;G10L19/008;G10L19/20 |
主分类号 |
G10L21/0364 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method, comprising:
receiving mixed audio content, in a reference audio channel representation, that are distributed over a plurality of audio channels of the reference audio channel representation, the mixed audio content having a mix of speech content and non-speech audio content; transforming one or more portions of the mixed audio content that are distributed over two or more non-Mid/Side (non-M/S) channels in the plurality of audio channels of the reference audio channel representation into one or more portions of transformed mixed audio content in an M/S audio channel representation that are distributed over one or more channels of the M/S audio channel representation, wherein the M/S audio channel representation comprises at least a mid-channel and a side-channel, wherein the mid-channel represents a weighted or non-weighted sum of two channels of the reference audio channel representation, and wherein the side-channel represents a weighted or non-weighted difference of two channels of the reference audio channel representation; determining metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation; and generating an audio signal that comprises the mixed audio content and the metadata for speech enhancement of the one or more portions of transformed mixed audio content in the M/S audio channel representation; wherein the method is performed by one or more computing devices. |
地址 |
San Francisco CA US |