发明名称 Method and apparatus for performing packet loss or frame erasure concealment
摘要 A method for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder receives encoded frames of compressed speech information transmitted from an encoder. The method determines whether an encoded frame has been lost, corrupted in transmission, or erased, synthesizes properly received frames, and decides on an overlap-add window to use in combining a portion of the synthesized speech signal with a subsequent speech signal resulting from a received and decoded packet, where the size of the overlap-add window is based on the unavailability of packets. If it is determined that an encoded frame has been lost, corrupted in transmission, or erased, the method performed an overlap-add operation on the portion of the synthesized speech signal and the subsequent speech signal, using the decided-on overlap-add window.
申请公布号 US9336783(B2) 申请公布日期 2016.05.10
申请号 US201314091185 申请日期 2013.11.26
申请人 AT&T Intellectual Property II, L.P. 发明人 Kapilow David A.
分类号 G10L19/00;G10L19/005;G10L21/003;G10L19/028 主分类号 G10L19/00
代理机构 代理人
主权项 1. A method for processing packets representing encoded speech of a speech signal, comprising: determining, by a receiver, a first packet of the packets is an expected packet, wherein an expected packet comprises a packet that is not lost, corrupted, erased or delayed; decoding, by the receiver, the first packet to create a plurality of speech samples in a buffer; delaying, by the receiver, the plurality of speech samples by a delay period; sending, by the receiver, the delayed plurality of speech samples to an output port; and when the determining further determines that a second packet of the packets is an unexpected packet, wherein an unexpected packet comprises a packet that is lost, corrupted, erased or delayed, computing an estimated pitch period, using a most recent 20 msec of the plurality of speech samples in the buffer, wherein the estimated pitch period is computed using a 2:1 decimated signal of the most recent 20 msec of the plurality of speech samples; andusing the estimated pitch period to select a portion of the plurality of speech samples to generate a synthesized speech segment.
地址 Atlanta GA US