摘要 |
A voice audio unit concatenation method mainly used for concatenative voice synthesis, specifically comprising the following steps: according to context-related HMM state sequences respectively corresponding to concatenated partial phonemes in two adjacent groups of voice segments, searching for the most approximate corresponding state; obtaining a pre-calculated and stored state-level time slice of the concatenated partial phonemes, and calculating the duration of various states after concatenation; and according to voice synthesis parameter data and the duration of various states in a database, performing concatenation and interpolation transitioning on voice synthesis parameters included in the two groups of voice segments. A concatenative voice synthesis system of the concatenation method can automatically choose portions, with the minimum acoustic feature difference and a stable change trend, between two adjacent groups of units to perform interpolation transitioning when voice units are concatenated, thereby effectively improving the intelligibility and degree of distinguishability of a synthesized voice. |