摘要 |
PROBLEM TO BE SOLVED: To certainly detect a human speech section without performing fake speech detection using a speech feature quantity such as the energy and FFT spectrum of sound. SOLUTION: A speech waveform itself having been passed through a low-pass filter suitable for speech analysis is processed to roughly divide a speech into small sections called small segments based upon zero-crossing points and when an adjacent small segment is small in energy, it is combined with the segment of directly before to integrate speech segments. Then a segment which has a start point at a zero-crossing point as the start point of a waveform having a positive value in the time direction and an end point at a zero-crossing point as the end point of a waveform having a negative value in the time direction is selected as a standard speech segment, and a segment which has a start point at the end point of the said reference partial speech signal waveform and an end point at the zero-crossing point as the end point of the waveform having the negative value in the time direction is selected as a speech segment to be compared, thereby finding the similarity between those two speech segments. COPYRIGHT: (C)2005,JPO&NCIPI
|