发明名称 |
Speech processing device and speech processing method |
摘要 |
A speech processing device which can accurately extract a conversation group from among a plurality of speakers, even when a conversation group formed of three or more people is present. This device (400) comprises: a spontaneous speech detection unit (420) and a direction-specific speech detection unit (430) which separately detect, from a sound signal, uttered speech from the speakers; a conversation establishment level calculation unit (450) which calculates a conversation establishment level for each separated segment of the time being determined, for all of the pairings of two people, on the basis of the detected uttered speech; an extended-period characteristic amount calculation unit (460) which calculates an extended-period characteristic amount for the conversation establishment level of the time being determined, for each pairing; and a conversation-partner determination unit (470) which extracts a conversation group which forms a conversation on the basis of the calculated extended-period characteristic amount. |
申请公布号 |
US9064501(B2) |
申请公布日期 |
2015.06.23 |
申请号 |
US201113816502 |
申请日期 |
2011.09.14 |
申请人 |
Panasonic Intellectual Property Management Co., Ltd. |
发明人 |
Yamada Maki;Endo Mitsuru |
分类号 |
G10L11/06;G10L15/20;G10L21/00;G10L25/48;G10L25/00;H04R25/00;G10L25/78;G10L25/06;G10L21/0208;G10L21/06 |
主分类号 |
G10L11/06 |
代理机构 |
Wenderoth, Lind & Ponack, L.L.P. |
代理人 |
Wenderoth, Lind & Ponack, L.L.P. |
主权项 |
1. A speech processing device, comprising:
a speech detector that detects speech of individual speakers from acoustic signals; a total-amount-of-speech calculator that calculates, for each of all pairs of the speakers and for each of segments defined by dividing a determination time period, a total amount of speech on the basis of the detected speech, the total amount of speech being a sum of amounts of speech of the pair of speakers in the segment; an established-conversation calculator that calculates, for each of the pairs of the speakers and for each of the segments, a degree of established conversation on the basis of the detected speech, the degree of established conversation being a value indicating a rate of a time when one of the pair of the speakers gives speech and the other of the pair of the speakers gives no speech; a long-time feature calculator that calculates, for each of the pairs of the speakers, a long-time feature obtained by integrating the degrees of established conversation calculated for the pair of the speakers within the determination time period; and a conversational-partner determining unit that extracts a conversation group holding conversation from the speakers, on the basis of the calculated long-time features, wherein the established-conversation calculator excludes, for each of the pairs of the speakers, the degree of established conversation of the segment with the sum of amounts of speech lower than a first threshold from the calculation of the long-time feature for the pair of the speakers, and the conversational-partner determining unit determines that the speakers of the pair with the long-time feature greater than or equal to a second threshold belong to the same conversation group. |
地址 |
Osaka JP |