摘要 |
Disclosed herein is a method for compensating intersession variability fo r automatic extraction of information from an input voice signal representin g an utterance of a speaker, comprising: processing the input voice signal t o provide feature vectors each formed by acoustic features extracted from th e input voice signal at a time frame; computing an intersession variability compensation feature vector; and computing compensated feature vectors based on the extracted feature vectors and the intersession variability compensat ion feature vector; wherein computing an intersession variability compensati on feature vector includes: creating a Universal Background Model (UBM) base d on a training voice database, the Universal Background Model (UBM) includi ng a number of Gaussians and probabilistically modeling an acoustic model sp ace, creating a voice recording database related to different speakers and c ontaining, for each speaker, a number of voice recordings acquired under dif ferent conditions; computing an intersession variability subspace matrix (U) based on the voice recording database, the intersession variability subspac e matrix (U) defining a transformation from an acoustic model space to an in tersession variability subspace representing intersession variability for al l the speakers; computing an intersession factor vector (xi) based on the in tersession variability subspace matrix (U), the intersession factor vector r epresenting the intersession variability of the input speech signal in the i ntersession variability subspace; and computing the intersession variability compensation feature vector based on the intersession variability subspace matrix (U), the intersession factor vector (xi) and the Universal Background Model (UBM).
|