发明名称 Method and system for aligning natural and synthetic video to speech synthesis
摘要 Facial animation in MPEG-4 can be driven by a text stream and a Facial Animation Parameters (FAP) stream. Text input is sent to a TTS converter that drives the mouth shapes of the face. FAPs are sent from an encoder to the face over the communication channel. Disclosed are codes bookmarks in the text string transmitted to the TTS converter. Bookmarks are placed between and inside words and carry an encoder time stamp. The encoder time stamp does not relate to real-world time. The FAP stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
申请公布号 US7366670(B1) 申请公布日期 2008.04.29
申请号 US20060464018 申请日期 2006.08.11
申请人 AT&T CORP. 发明人 BASSO ANDREA;BEUTNAGEL MARK CHARLES;OSTERMANN JOERN
分类号 G10L13/00;G06T13/00 主分类号 G10L13/00
代理机构 代理人
主权项
地址