摘要 |
<p>Speaker recognition is decided by a similarity measure (D) calculated from comparing selected feature vectors among an input speech signal sequence of feature vectors (A) and a selected sequence (B) of reference vectors selected from a plurality of pre-stored reference sequences. Prior to comparison of the input and reference vector sequences, the two sequences are time normalized to align corresponding feature vectors. A significant sound specifying signal (V) including a time sequence of elementary signals is generated in synchronism with one of the input and reference sequences and indicates which feature vectors in that one of the input and reference sequences are considered to represent significant sound. The similarity measure (D) is then calculated in accordance with the comparison of those feature vectors in the one sequence which are indicated by the significant sound specifying signal as representing significant sound and the corresponding feature vectors of the other sequence.</p> |