摘要 |
PROBLEM TO BE SOLVED: To easily specify voicing contents at arbitrary point of time of voicing in singing voice synthesis.SOLUTION: A voice acquisition part 22 acquires a voice signal V1 of a voice that a user pronounces from a voice input device 14. An indication acquisition part 24 acquires indication information U specifying the point of time of voicing of each note that the user indicates from an indication input device 16. A voice recognition part 32 specifies voicing contents Z through voice recognition on the voice signal V1. Specifically, the voice recognition part 32 determines, for each of a plurality of candidates differing in array of phonemes or start time of each phoneme, whether the recognition candidate is discarded or maintained according to the relation between the start point of each phoneme of the recognition candidate and each point of time of voicing that the indication information U specifies on a time base, specifies the voicing contents of the voice signal V1 from a plurality of maintained recognition candidates. An information generation part 34 generates voicing information S representing the relation between the voicing contents Z specified by the voice recognition part 32 and each note whose voicing point of time is specified with the indication information U. |