发明名称 SYSTEM AND METHOD FOR GENERATING CUSTOMIZED TEXT-TO-SPEECH VOICES
摘要 A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
申请公布号 US2014188480(A1) 申请公布日期 2014.07.03
申请号 US201414196578 申请日期 2014.03.04
申请人 AT&T Intellectual Property II, L.P. 发明人 BANGALORE Srinivas;Feng Junlan;Rahim Mazin G.;Schroeter Juergen;Syrdal Ann K.;Schulz David
分类号 G10L13/02 主分类号 G10L13/02
代理机构 代理人
主权项 1. A method comprising: in response to a user request to generate a custom text-to-speech voice for a domain, without further interaction from the user: collecting text data associated with the domain from a pre-existing text data source, to yield collected text data; selecting synthesis speech units specific to the domain from a pre-existing inventory of synthesis speech units using the collected text data; caching the synthesis speech units specific to the domain as an in-domain inventory of synthesis speech units; and generating, via a processor, the custom text-to-speech voice for a specific task in the domain utilizing the in-domain inventory of synthesis speech units.
地址 Atlanta GA US