发明名称 Speech synthesis device, speech synthesis method, and computer program product
摘要 According to an embodiment, a speech synthesis device includes a first storage, a second storage, a first generator, a second generator, a third generator, and a fourth generator. The first storage is configured to store therein first information obtained from a target uttered voice. The second storage is configured to store therein second information obtained from an arbitrary uttered voice. The first generator is configured to generate third information by converting the second information so as to be close to a target voice quality or prosody. The second generator is configured to generate an information set including the first information and the third information. The third generator is configured to generate fourth information used to generate a synthesized speech, based on the information set. The fourth generator configured to generate the synthesized speech corresponding to input text using the fourth information.
申请公布号 US9135910(B2) 申请公布日期 2015.09.15
申请号 US201313765012 申请日期 2013.02.12
申请人 KABUSHIKI KAISHA TOSHIBA 发明人 Tamura Masatsune;Morita Masahiro
分类号 G10L13/00;G10L13/08;G10L13/06;G10L13/033;G10L15/00 主分类号 G10L13/00
代理机构 Posz Law Group, PLC 代理人 Posz Law Group, PLC
主权项 1. A speech synthesis device comprising: a first storage configured to store therein first information obtained from a target uttered voice together with attribute information thereof; a second storage configured to store therein second information obtained from an arbitrary uttered voice together with attribute information thereof; a first generator configured to generate third information by converting the second information so as to be close to a target voice quality or prosody; a second generator configured to generate an information set including the first information and the third information; a third generator configured to generate fourth information used to generate a synthesized speech, based on the information set; and a fourth generator configured to generate the synthesized speech corresponding to input text using the fourth information, where the second generator generates the information set by adding the first information and a portion of the third information, the portion of the third information being selected so as to improve coverages for each attribute of the information set based on the attribute information.
地址 Tokyo JP