摘要 |
<p>PROBLEM TO BE SOLVED: To achieve speech recognition of plural speakers with high reliability. SOLUTION: This system is configured of a speech signal input part 1, a video signal input part 2, an unspecified speaker speech recognition part 3 for extracting a common feature from speeches of multi-speakers, making a standard pattern, and calculating a degree of similarity between the input speeches and a standard speech pattern, a specific speaker speech recognition part 4 for calculating a degree of similarity between the input speech and the speech of a pre-registered speaker, a face region extracting part 9 for extracting a face region from an input video, a face image database 11 for recording face image data of plural specific speakers and their identification numbers, an image comparison part 10 for outputting the degree of similarity with the image data inputted from the face region extracting part 9 and the face image database 11, and a recognition result integration part 5 for calculating an integrated degree of similarity from the outputs of the unspecified speaker speech recognition part 3, the specific speaker speech recognition part 4, and the image comparison part 10, and outputting the recognition result.</p> |