摘要 |
Speech signal classification and encoding systems and methods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. First, a voice activity detector (VA D) discriminates between active and inactive speech frames. If an inactive spee ch frame is detected (background noise signal) then the classification chain en ds and the frame is encoded with comfort noise generation (CNG). If an active speech frame is detected, the frame is subjected to a second classifier dedicated to discriminate unvoiced frames. If the classifier classifies the frame as unvoiced speech signal, the classification chain ends, and the fram e is encoded using a coding method optimized for unvoiced signals. Otherwise, the speech frame is passed through to the "stable voiced" classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals. Otherwise , the frame is likely to contain a non-stationary speech segment such as a voiced onset or rapidly evolving voiced speech signal. In this case a genera l- purpose speech coder is used at a high bit rate for sustaining good subjecti ve quality .
|