发明名称 Fusion of audio and video based speaker identification for multimedia information access
摘要 A method and apparatus are disclosed for identifying a speaker in an audio-video source using both audio and video information. An audio-based speaker identification system identifies one or more potential speakers for a given segment using an enrolled speaker database. A video-based speaker identification system identifies one or more potential speakers for a given segment using a face detector/recognizer and an enrolled face database. An audio-video decision fusion process evaluates the individuals identified by the audio-based and video-based speaker identification systems and determines the speaker of an utterance in accordance with the present invention. A linear variation is imposed on the ranked-lists produced using the audio and video information. The decision fusion scheme of the present invention is based on a linear combination of the audio and the video ranked-lists. The line with the higher slope is assumed to convey more discriminative information. The normalized slopes of the two lines are used as the weight of the respective results when combining the scores from the audio-based and video-based speaker analysis. In this manner, the weights are derived from the data itself.
申请公布号 US6567775(B1) 申请公布日期 2003.05.20
申请号 US20000558371 申请日期 2000.04.26
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 MAALI FEREYDOUN;VISWANATHAN MAHESH
分类号 G01L15/00;G01L21/00;G06K9/62;G10L17/00;(IPC1-7):G01L15/00 主分类号 G01L15/00
代理机构 代理人
主权项
地址