摘要 |
A voice-activity detector (VAD 104) takes (214) a currently-received set and a previously-received set of samples of a time-domain (voice) signal, converts (216) them into a frequency-domain representation of the signal, filters out (218) negative and low (noise) frequencies, weights (220) the energies of frequency bins (ranges) of the remaining frequencies proportionately to their frequencies, and computes (220) the total power of the ranges. It first initializes (226) by determining (304, 306) if power peaks of any of the ranges exceed a first threshold (ceiling 228); if not, it lowers (302) the ceiling and continues initializing, and if so, it ends initializing (308), indicates (334) that voice has been detected, sets (330) the ceiling to the highest peak, and stores (332) the total power as a "smoothed" power. If initialization has ended, it determines (320, 322) if power peaks of any of the ranges exceed a second threshold that is a fraction of the ceiling; if so, it indicates (334) that voice has been detected, sets (330) the ceiling to the highest peak that exceeds the ceiling, and computes (332) a new "smoothed" power as a function of the total power and the current "smoothed" power. If initialization has ended and energy peaks of none of the ranges exceed the second threshold, it determines (340, 342) if a ratio of the total power and the smoothed power exceeds a third threshold; if so, it indicates (344) that voice has been detected, and if not, it indicates (346) that voice has not been detected.
|