发明名称 Speech synthesis apparatus, speech synthesis method, speech synthesis program product, and learning apparatus
摘要 According to one embodiment, a speech synthesis apparatus includes a language analyzer, statistical model storage, model selector, parameter generator, basis model storage, and filter processor. The language analyzer analyzes text data and outputs language information data that represents linguistic information of the text data. The statistical model storage stores statistical models prepared by statistically modeling acoustic information included in speech. The model selector selects a statistical model from the models based on the language information data. The parameter generator generates speech parameter sequences using the statistical model selected by the model selector. The basis model storage stores a basis model including basis vectors, each of which expresses speech information for each limited frequency range. The filter processor outputs synthetic speech by executing filter processing of the speech parameter sequences and the basis model.
申请公布号 US9110887(B2) 申请公布日期 2015.08.18
申请号 US201213727011 申请日期 2012.12.26
申请人 KABUSHIKI KAISHA TOSHIBA 发明人 Ohtani Yamato;Tamura Masatsune;Morita Masahiro
分类号 G06F17/27;G06F17/28;G10L13/02 主分类号 G06F17/27
代理机构 Posz Law Group, PLC 代理人 Posz Law Group, PLC
主权项 1. A speech synthesis apparatus comprising: a statistical model storage configured to store a plurality of statistical models prepared by statistically modeling acoustic information included in speech; a basis model storage configured to store a basis model including a plurality of basis vectors, each of which expresses speech information for each limited frequency range; and a computer programmed to, based on instructions stored in a memory: analyze, by a language analyzer, text data and output language information data that represents linguistic information of the text data;select, by a model selector, a statistical model from the plurality of statistical models stored in the statistical model storage, based on the language information data output from the language analyzer;generate, by a parameter generator, a plurality of speech parameter sequences using the statistical model selected by the model selector;output, by a filter processor, synthetic speech by executing filter processing of the plurality of speech parameter sequences generated by the parameter generator and the basis model stored in the basis model storage, wherein any of the plurality of speech parameter sequences represents weights to be applied to the basis vectors upon linearly combining the plurality of basis vectors in the basis model in the basis model storage.
地址 Tokyo JP