发明名称 Prosody Generation Using Syllable-Centered Polynomial Representation of Pitch Contours
摘要 The present invention discloses a parametrical representation of prosody based on polynomial expansion coefficients of the pitch contour near the center of each syllable. The said syllable pitch expansion coefficients are generated from a recorded speech database, read from a number of sentences by a reference speaker. By correlating the stress level and context information of each syllable in the text with the polynomial expansion coefficients of the corresponding spoken syllable, a correlation database is formed. To generate prosody for an input text, stress level and context information of each syllable in the text is identified. The prosody is generated by using the said correlation database to find the best set of pitch parameters for each syllable. By adding to global pitch contours and using interpolation formulas, complete pitch contour for the input text is generated. Duration and intensity profile are generated using a similar procedure.
申请公布号 US2014195242(A1) 申请公布日期 2014.07.10
申请号 US201414216611 申请日期 2014.03.17
申请人 Chen Chengjun Julian 发明人 Chen Chengjun Julian
分类号 G10L13/02 主分类号 G10L13/02
代理机构 代理人
主权项 1. A method for building databases for prosody generation in speech synthesis using one or more processors comprising: A) compile a text corpus of sentences containing all the prosody phenomena of interest; B) for each phrase in each said sentence, identify the phrase type; C) segment each sentence into syllables, identify the property and context information of each said syllable; D) read the sentences by a reference speaker to make a recording of voice signals with simultaneous electroglottograph signals if an electroglottograph instrument is available; E) segment the voice signals and electroglottograph signals of each sentence into syllables, each said syllable is aligned with a syllable in the text; F) identify the voiced section in each syllable of the voice recording; G) calculate pitch values in the said voiced section; H) generate a polynomial expansion of the pitch contour of each said voiced section in each syllable by least-squares fitting, comprising the use of Gegenbauer polynomials, which at least have a constant term representing the average pitch of the said syllable; I) for all phrases of a given type, generate a polynomial expansion of the values of said average pitch of all syllables in the said phrases using least-squares fitting, to generate an average global pitch contour of the given phrase type; J) form a set of syllable pitch parameters for each said syllable by subtracting the value of the global pitch profile at that point from the value of the average pitch of the said syllable together with the rest of polynomial expansion coefficients for the said syllable; K) correlate the syllable pitch parameters with the property and context information of the said syllable from an analysis of the text to form a database of syllable pitch parameters; L) correlate the intensity and duration parameters of a syllable to the property and context information of the said syllable from an analysis of the text to form a database of intensity and duration.
地址 White Plains NY US