发明名称 Method and Apparatus for Processing Speech Signal According to Frequency-Domain Energy
摘要 A method and an apparatus for processing a speech signal according to frequency-domain energy where the method and apparatus include receiving an original speech signal including a first speech frame and a second speech frame that are adjacent to each other, performing a Fourier transform on the first speech frame and the second speech frame, obtaining a frequency-domain energy distribution of the first speech frame and the second speech frame, obtaining a frequency-domain energy correlation coefficient, and segmenting the original speech signal according to the frequency-domain energy correlation coefficient. Hence a problem that a speech signal segmentation result has low accuracy due to a characteristic of a phoneme of a speech signal or severe impact of noise when refined speech signal segmentation is performed may be resolved.
申请公布号 US2016351204(A1) 申请公布日期 2016.12.01
申请号 US201615237095 申请日期 2016.08.15
申请人 Huawei Technologies Co., Ltd. 发明人 Xu Lijing
分类号 G10L21/0308;G10L25/18;G10L25/06 主分类号 G10L21/0308
代理机构 代理人
主权项 1. A method for processing a speech signal according to frequency-domain energy, comprising: receiving an original speech signal, wherein the original speech signal comprises a first speech frame and a second speech frame that are adjacent to each other; performing a Fourier transform on the first speech frame to obtain a first frequency-domain signal; performing the Fourier transform on the second speech frame to obtain a second frequency-domain signal; obtaining a frequency-domain energy distribution of the first speech frame according to the first frequency-domain signal; obtaining a frequency-domain energy distribution of the second speech frame according to the second frequency-domain signal, wherein the frequency-domain energy distribution represents an energy distribution characteristic of the speech frame in a frequency domain; obtaining a frequency-domain energy correlation coefficient between the first speech frame and the second speech frame according to the frequency-domain energy distribution of the first speech frame and the frequency-domain energy distribution of the second speech frame, wherein the frequency-domain energy correlation coefficient is used to represent a spectral change from the first speech frame to the second speech frame; and segmenting the original speech signal according to the frequency-domain energy correlation coefficient.
地址 Shenzhen CN