主权项 |
1. A speaker state detecting apparatus comprising:
an audio input unit which acquires, at least, a first voice emanated by a first speaker and a second voice emanated by a second speaker; a storage unit which stores a state affection model with respect to a set of an overlap period or interval between two speech periods that are temporally continuous and a state of a speaker who has emanated a voice in a preceding speech period of the two speech periods, the model including probabilities of respective possible states which a speaker who has emanated a voice in a later speech period of the two speech periods can have; and a processor adapted to detect an overlap period between a first speech period of the first speaker included in the first voice and a second speech period of the second speaker included in the second voice, which starts before the first speech period, or an interval between the first speech period and the second speech period; extract first state information representing a state of the first speaker from the first speech period and second state information representing a state of the second speaker from the second speech period; and detect the state of the first speaker in the first speech period based on the overlap period or the interval and the first and second state information, the detecting the state of the first speaker comprising:
detecting a state of the second speaker in the second speech period based on the second state information;detecting a state of the first speaker in the first speech period based on the first state information;deriving a degree of accuracy representing a likelihood of the state of the first speaker; anddetermining the detected state of the first speaker to be a state of the first speaker in the first speech period when the degree of accuracy is higher than a redetermination threshold value, and when the degree of accuracy is equal to or lower than the redetermination threshold value, obtaining probabilities of the possible states which the first speaker can have, corresponding to a set of the overlap period or the interval and the state of the second speaker in the second speech period in accordance with the state affection model and determining a state for which the probability is the maximum, of the possible states which the first speaker can have, to be the state of the first speaker in the first speech period. |