发明名称 PITCH MARKING IN SPEECH PROCESSING
摘要 According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.
申请公布号 US2017117001(A1) 申请公布日期 2017.04.27
申请号 US201514918601 申请日期 2015.10.21
申请人 International Business Machines Corporation 发明人 Shechtman Slava
分类号 G10L21/01;G10L25/06;G10L25/90;G10L25/09 主分类号 G10L21/01
代理机构 代理人
主权项 1. A computerized method for receiving and processing continuous speech signals for generating therefrom one or more pitch mark combinations FOR speech processing, comprising: receiving a continuous speech signal representing audible speech recorded by a microphone, wherein a sequence of pitch values and a plurality of pitch mark temporal values are computed from said continuous speech signal, each of said plurality of pitch mark temporal values associated with one element of said sequence; using at least one hardware processor for executing a code for processing said continuous speech signal and generating at least one pitch mark combination, said processing comprises: computing for each of said plurality of pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of said continuous speech signal around said pitch mark temporal values associated with pairs of elements in said sequence;computing at least one new temporal value between said lower limit temporal value and said upper limit temporal value;automatically generating said at least one pitch mark combination by replacing at least one of said plurality of pitch mark temporal values with said at least one new temporal value; outputting said at least one pitch mark combination of said plurality of pitch mark temporal values to a speech processor for at least one of speech processing, modification, and conversion to an audible output sound signal; wherein elements of said at least one combination are between said lower limit temporal value and said upper limit temporal value.
地址 Armonk NY US