发明名称 Speaker recognition method through emotional model synthesis based on neighbors preserving principle
摘要 A speaker recognition method through emotional model synthesis based on Neighbors Preserving Principle is enclosed. The methods includes the following steps: (1) training the reference speaker's and user's speech models; (2) extracting the neutral-to-emotion transformation/mapping sets of GMM reference models; (3) extracting the emotion reference Gaussian components mapped by or corresponding to several Gaussian neutral reference Gaussian components close to the user's neutral training Gaussian component; (4) synthesizing the user's emotion training Gaussian component and then synthesizing the user's emotion training model; (5) synthesizing all user's GMM training models; (6) inputting test speech and conducting the identification. This invention extracts several reference speeches similar to the neutral training speech of a user from a speech library by employing neighbor preserving principles based on KL divergence and combines an emotion training speech of the user using the emotion reference speech in the reference speech, improving the performance of the speaker recognition system in the situation where the training speech and the test speech are mismatched, and the robustness of the speaker recognition system is increased.
申请公布号 US9355642(B2) 申请公布日期 2016.05.31
申请号 US201214346960 申请日期 2012.09.04
申请人 ZHEJIANG UNIVERSITY 发明人 Wu Zhaohui;Yang Yingchun;Chen Li
分类号 G10L17/26;G10L25/63;G10L15/14;G10L15/06 主分类号 G10L17/26
代理机构 代理人 Chen Jiwen
主权项 1. A speaker recognition method through emotional model synthesis based on neighbors preserving principle, characterized in that the method comprising the following steps of: (1) obtaining a plurality sets of reference speeches and a user's neutral training speech, and conducting model training to these speeches to obtain a plurality sets of Gaussian Mixture Model (GMM) reference models and a user's neutral training model; the reference speeches comprising neutral reference speech and “m” emotion reference models, where m is a natural number greater than 0; (2) extracting a neutral-to-emotion Gaussian components transformation set from each set of GMM reference model; (3) according to KL (Kullback-Leibler) divergence calculation method, respectively calculating KL divergence between each neutral training Gaussian component in the neutral training model and neutral reference Gaussian components in all neutral reference models; selecting “n” neutral reference Gaussian components having the smallest KL divergence with each corresponding neutral training Gaussian component; then selecting “m” emotion reference Gaussian components corresponding to each neutral reference Gaussian component in “n” neutral reference Gaussian components, where n is a natural number greater than 0; (4) combining the selected n×m Gaussian components corresponding to each neutral training Gaussian component to obtain “m” emotion training Gaussian component, and further obtain “m” emotional training models for the user; (5) repeating step (1) to step (4) to synthesize the GMM training models for all users; (6) inputting a user's test speech, and computing likelihood score between the test speech and all users' GMM training models, respectively, identifying a corresponding user of the GMM training model with the greatest likelihood score as a speaker to be identified.
地址 Hangzhou CN
您可能感兴趣的专利