System and method for synthetic voice generation and modification,申请号US201514623183-传众专利搜索

发明名称	System and method for synthetic voice generation and modification
摘要	Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.
申请公布号	US9269346(B2)	申请公布日期	2016.02.23
申请号	US201514623183	申请日期	2015.02.16
申请人	AT&T Intellectual Property I, L.P.	发明人	Conkie Alistair D.;Syrdal Ann K.
分类号	G10L13/027;G10L13/047;G10L13/06;H04B7/04;H04B7/06;H04W72/04	主分类号	G10L13/027
代理机构		代理人
主权项	1. A method comprising: storing, in a database, voice data, wherein the voice data is associated with a plurality of voices, wherein the plurality of voices are stored within libraries according to emotions; identifying, using user speech exhibited by a user, a user emotion; identifying, via a processor and according to the user emotion, a first text-to-speech voice of the plurality of voices which are in the database, wherein the first text-to-speech voice has a first emotional content from a first speaker; identifying, via the processor and according to the user emotion, a second text-to-speech voice of the plurality of voices which are in the database, wherein the second text-to-speech voice has a second emotional content from a second speaker, and wherein the second emotional content is distinct from the first emotional content; and synthesizing synthesized speech using the first text-to-speech voice and the second text-to-speech voice, wherein the synthesized speech mimics the user emotion.
地址	Atlanta GA US