摘要 |
PURPOSE: A device and method for recognizing a voice are provided to easily recognize the voice with accurately identifying a juncture in which a speaker speaks. CONSTITUTION: A device for recognizing a voice includes an input part (110), a detecting part (150), a saliency map generating part (160), an information obtaining part (170), a voice recognizing part (180), and an output part (120). The input part is inputted with multiple photographed images and sound sources, and a user is included in the photographed images. The detecting part detects the lip regions of a user from the respective images. The saliency map generating part generates dynamic saliency maps for the lip regions. The information obtaining part obtains motion information for a lip using the dynamic saliency maps. The voice recognizing part recognizes a voice for the sound sources based on the motion information for the lip. The output part outputs a result from recognizing the voice. [Reference numerals] (110) Input part; (120) Extracting unit; (130) Storage unit; (140) Location determination unit; (160) Saliency map generating part; (170) Information obtaining part; (180) Voice recognizing part; (190) Control unit; (200) Face detecting unit; (300) Lips detecting unit |