摘要 |
<P>PROBLEM TO BE SOLVED: To naturally advance interaction with the user, while making the internal processings within a system more appropriate. <P>SOLUTION: A system is provided with a storage device 26 which stores an answer unit model obtained by statistically modeling the answer timing, at which one interacting speaker answers and a storage device 28 which stores a meaning processing unit model obtained, by statistically modeling units of meaning processing, and a speech recognizing part 20 recognizes a speech that a user speaks. Image information of the user who is speaking, sound features of the speech that the user utters, and the linguistic features of the speech that the user utters are extracted and a processing unit deciding part 16 decides the meaning processing timing and the answer timing, based on the image information, sound features, meaning processing unit model, answer unit model, speech recognition result of the speech recognition means, and linguistic features, so that when it is decided that it is the meaning processing timing and answer timing, an answer is made by voice, while the content subjected to meaning processing is reflected. <P>COPYRIGHT: (C)2005,JPO&NCIPI |