摘要 |
<p>Even when there are a plurality of speakers and even when the relationship between the speakers is accompanied with temporal change, the speakers or a cluster of the speakers can precisely be recognized. A voice data analysis device is equipped with: a speaker model derivation means for deriving a speaker model, which is a model for specifying the characteristics of the voice of each speaker, from voice data consisting of a plurality of utterances each labeled with a speaker label, which is information for identifying the speaker; a speaker co-occurrence model derivation means for deriving a speaker co-occurrence model, which is a model expressing the strength of the co-occurrence relationship between the speakers, from session data obtained by dividing the voice data into units consisting of a series of conversations, using the speaker model derived by the speaker model derivation means; and a model structure update means which detects a predetermined phenomenon by referring to the session of newly added voice data and when detecting the predetermined phenomenon, updates the structures of at least either the speaker model or the speaker co-occurrence model.</p> |