摘要 |
<p><P>PROBLEM TO BE SOLVED: To precisely recognize a speaker and to easily calculate the distance to a speaker model. <P>SOLUTION: In a frame trimming means 21, speech waveform data are successively trimmed to the frames with a frame width L at a shift interval T for outputting to a feature vector generating means 22. The presence or absence of pitch in a frame trimmed by the frame cutting means is detected by a pitch detection means 23. The frame trimming means changes the shift interval T and the frame width L according to the pitch detection result. More specifically, the shift interval T and the frame width L between sounds, where a sound frequency exists to a silent section, are changed to be shorter. A feature vector generating means converts the sound waveform data of each frame to a feature vector for outputting to a next-stage distance calculation means 24. The distance calculation means calculates the distance between a feature vector series and a speaker model stored in a speaker model storage means 25 for outputting to a next-stage recognition means 26. The recognition means compares the distance data from the distance calculation means and a preset threshold to recognize the speaker. <P>COPYRIGHT: (C)2005,JPO&NCIPI</p> |