发明名称 METHOD AND SYSTEM FOR GENERATING ADVANCED FEATURE DISCRIMINATION VECTORS FOR USE IN SPEECH RECOGNITION
摘要 A method of renormalizing high-resolution oscillator peaks, extracted from windowed samples of an audio signal, is disclosed. Feature vectors are generated for which variations in both fundamental frequency and time duration of speech are substantially mitigated. The feature vectors may be aligned within a common coordinate space, free of those variations in frequency and time duration that occurs between speakers, and even over speech by a single speaker, to facilitate a simple and accurate determination of matches between those AFDVs generated from a sample of the audio signal and corpus AFDVs generated for known speech at the phoneme and sub-phoneme level. The renormalized feature vectors can be combined with traditional feature vectors such as MFCCs, or they can be used exclusively to identify voiced, semi-voiced and unvoiced sounds.
申请公布号 US2016284343(A1) 申请公布日期 2016.09.29
申请号 US201414217198 申请日期 2014.03.17
申请人 Short Kevin M.;Hone Brian 发明人 Short Kevin M.;Hone Brian
分类号 G10L15/02;G10L25/93;G10L25/24;G10L25/21;G10L25/18 主分类号 G10L15/02
代理机构 代理人
主权项 1. A method of generating advanced feature discrimination vectors (AFDVs) representing sounds forming at least part of an input audio signal, the method comprising: taking a plurality of samples of the input audio signal, each of the samples being a portion of the input audio signal as it evolves over a window of predetermined time; for each sample of the audio signal taken: performing a signal analysis on the sample to extract one or more high resolution oscillator peaks therefrom, the extracted oscillator peaks forming a spectral representation of the sample; renormalizing the extracted oscillator peaks to eliminate variations in the fundamental frequency and time duration for each sample occurring over the window; normalizing the power of the renormalized extracted oscillator peaks; and forming the renormalized and power normalized extracted oscillator peaks into an AFDV for the sample.
地址 Durham NH US