发明名称 Method and Apparatus for Efficient I-Vector Extraction
摘要 Most speaker recognition systems use i-vectors which are compact representations of speaker voice characteristics. Typical i-vector extraction procedures are complex in terms of computations and memory usage. According to an embodiment, a method and corresponding apparatus for speaker identification, comprise determining a representation for each component of a variability operator, representing statistical inter- and intra-speaker variability of voice features with respect to a background statistical model, in terms of a linear operator common to all components of the variability operator and having a first dimension larger than a second dimension of the components of the variability operator; computing statistical voice characteristics of a particular speaker using the determined representations; and employing the statistical voice characteristics of the particular speaker in performing speaker recognition. Computing the voice characteristics, by using the determined representations, results in significant reduction in memory usage and possible increase in execution speed.
申请公布号 US2014222428(A1) 申请公布日期 2014.08.07
申请号 US201313856992 申请日期 2013.04.04
申请人 NUANCE COMMUNICATIONS, INC. 发明人 Cumani Sandro;Laface Pietro
分类号 G10L17/00 主分类号 G10L17/00
代理机构 代理人
主权项 1. A computer-implemented method of speaker identification, comprising: determining a representation for each linear operator of a plurality of linear operators, each linear operator representing variability of statistical voice features with respect to a statistical model component among a plurality of statistical model components, in terms of (i) a first orthogonal operator specific to the respective linear operator of the plurality of linear operators, (ii) a weighting operator specific to the respective linear operator of the plurality of linear operators, and (iii) a second linear operator common to the plurality of linear operators and having a first dimension larger than a second dimension of the plurality of linear operators; computing statistical voice characteristics of a particular speaker using at least the representations corresponding to each of the plurality of linear operators determined; and employing the statistical voice characteristics of the particular speaker to determine whether an input speech signal corresponds to the particular speaker.
地址 Burlington MA US