发明名称 VOICE RECOGNITION APPARATUS, VOICE RECOGNITION METHOD AND PROGRAM
摘要 There is provided an apparatus and a method for rapidly extracting a target sound from a sound signal where a variety of sounds are mixed generated from a plurality of the sound sources. There is a voice recognition unit including a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process, and a voice recognition unit for inputting a sound source extraction result to execute a voice recognition process. In the tracking unit, a segment being created management unit that creates and manages a voice segment per unit of sound source sequentially detects a sound source direction, sequentially updates a voice segment estimated by connecting a detection result to a time direction, creates an extraction filter for a sound source extraction after a predetermined time is elapsed, and sequentially creates a sound source extraction result by sequentially applying the extraction filter to an input voice signal. The voice recognition unit sequentially executes the voice recognition process to a partial sound source extraction result to output a voice recognition result.
申请公布号 US2016005394(A1) 申请公布日期 2016.01.07
申请号 US201314766246 申请日期 2013.12.20
申请人 SONY CORPORATION 发明人 HIROE Atsuo
分类号 G10L15/04;G10L21/0272 主分类号 G10L15/04
代理机构 代理人
主权项 1. A voice recognition apparatus, comprising: a tracking unit for detecting a sound source direction and a voice segment to execute a sound source extraction process; and a voice recognition unit for inputting a sound source extraction result from the tracking unit to execute a voice recognition process, the tracking unit creating a segment being created management unit that creates and manages a voice segment per unit of sound source, each segment being created management unit created sequentially detecting a sound source direction to execute a voice segment creation process that sequentially updates a voice segment estimated by connecting a detection result to a time direction,creating an extraction filter for a sound source extraction after a predetermined time is elapsed from a voice segment beginning, andsequentially applying the extraction filter created to an input voice signal to sequentially create a partial sound source extraction result of a voice segment, the tracking unit sequentially outputting the partial sound source extraction result created by the segment being created management unit to the voice recognition unit, the voice recognition unit sequentially executing the voice recognition process to the partial sound source extraction result inputted from the tracking unit to output a voice recognition result.
地址 Minato-ku, Tokyo JP