发明名称 Method for improving speaker identification by determining usable speech
摘要 Method for improving speaker identification by determining usable speech. Degraded speech is preprocessed in a speaker identification (SID) process to produce SID usable and SID unusable segments. Features are extracted and analyzed so as to produce a matrix of optimum classifiers for the detection of SID usable and SID unusable speech segments. Optimum classifiers possess a minimum distance from a speaker model. A decision tree based upon fixed thresholds indicates the presence of a speech feature in a given speech segment. Following preprocessing, degraded speech is measured in one or more time, frequency, cepstral or SID usable/unusable domains. The results of the measurements are multiplied by a weighting factor whose value is proportional to the reliability of the corresponding time, frequency, or cepstral measurements performed. The measurements are fused as information, and usable speech segments are extracted for further processing. Such further processing of co-channel speech may include speaker identification where a segment-by-segment decision is made on each usable speech segment to determine whether they correspond to speaker #1 or speaker #2. Further processing of co-channel speech may also include constructing the complete utterance of speaker #1 or speaker #2. Speech features such as pitch and formants may be extended back into the unusable segments to form a complete utterance from each speaker.
申请公布号 US2005027528(A1) 申请公布日期 2005.02.03
申请号 US20040923157 申请日期 2004.08.18
申请人 YANTORNO ROBERT E.;BENINCASA DANIEL S.;WENNDT STANLEY J.;SMOLENSKI BRETT Y. 发明人 YANTORNO ROBERT E.;BENINCASA DANIEL S.;WENNDT STANLEY J.;SMOLENSKI BRETT Y.
分类号 G10L17/00;(IPC1-7):G10L19/02 主分类号 G10L17/00
代理机构 代理人
主权项
地址