发明名称 AGING A TEXT-TO-SPEECH VOICE
摘要 A voice recipient may request a text-to-speech (TTS) voice that corresponds to an age or age range. An existing TTS voice or existing voice data may be used to create a TTS voice corresponding to the requested age by encoding the voice data to voice parameter values, transforming the voice parameter values using a voice-aging model, synthesizing voice data using the transformed parameter values, and then creating a TTS voice using the transformed voice data. The voice-aging model may model how one or more voice parameters of a voice change with age and may be created from voice data stored in a voice bank.
申请公布号 US2016379622(A1) 申请公布日期 2016.12.29
申请号 US201615138614 申请日期 2016.04.26
申请人 VocaliD, Inc. 发明人 Patel Rupal;Meltzner Geoffrey Seth
分类号 G10L13/027;G10L13/033;G10L13/06;G10L13/047 主分类号 G10L13/027
代理机构 代理人
主权项 1. A computer-implemented method for creating a text-to-speech voice, the method comprising: obtaining voice data of a voice recipient, wherein the text-to-speech voice is being created for the voice recipient; determining a voice characteristic of the voice recipient by processing the voice data of the voice recipient; selecting a voice donor from a plurality of voice donors using the voice characteristic by: determining a voice characteristic for each voice donor of the plurality of voice donors by processing voice data of each voice donor, andcomparing the voice characteristic of the voice recipient with the voice characteristic for each voice donor of the plurality of voice donors; obtaining a first age corresponding to the selected voice donor; obtaining a second age corresponding to the voice recipient; obtaining voice data of the selected voice donor; encoding the voice data of the selected voice donor to obtain a plurality of voice parameter values, wherein the plurality of voice parameter values comprises at least one of vocal tract parameter values, vocal source parameter values, or prosodic parameter values; obtaining a voice-aging model, wherein: the voice-aging model receives as input (i) input voice parameter values, (ii) an input age corresponding to the input voice parameter values, and (iii) an output age corresponding to output voice parameter values, andthe voice-aging model generates output voice parameter values by transforming the input voice parameter values using the input age and the output age; transforming the plurality of voice parameter values using the voice-aging model, the first age, and the second age to obtain a plurality of transformed voice parameter values; synthesizing transformed voice data using the plurality of transformed parameter values; and creating a text-to-speech voice using the transformed voice data.
地址 Belmont MA US