发明名称 Method and apparatus for producing audio-visual synthetic speech
摘要 A method and apparatus provide a video image of facial features synchronized with synthetic speech. Text input is transformed into a string of phonemes and timing data, which are transmitted to an image generation unit. At the same time, a string of synthetic speech samples is transmitted to an audio server. The audio server produces signals for an audio speaker, causing the audio signals to be continuously audibilized; additionally, the audio server initializes a timer. The image generation unit reads the timing data from the timer and, by consulting the phoneme and timing data, determines the position of the phoneme currently being audibilized. The image generation unit then calculates the facial configuration corresponding to the position in the string of phonemes, calculates the facial configuration, and causes the facial configuration to be displayed on a video device.
申请公布号 US5657426(A) 申请公布日期 1997.08.12
申请号 US19940258145 申请日期 1994.06.10
申请人 DIGITAL EQUIPMENT CORPORATION 发明人 WATERS, KEITH;LEVERGOOD, THOMAS M.
分类号 G10L15/24;G10L21/06;(IPC1-7):G10L3/00 主分类号 G10L15/24
代理机构 代理人
主权项
地址