发明名称 SYSTEM AND METHOD FOR ENHANCING SPEECH ACTIVITY DETECTION USING FACIAL FEATURE DETECTION
摘要 Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing audio. A system configured to practice the method monitors, via a processor of a computing device, an image feed of a user interacting with the computing device and identifies an audio start event in the image feed based on face detection of the user looking at the computing device or a specific region of the computing device. The image feed can be a video stream. The audio start event can be based on a head size, orientation or distance from the computing device, eye position or direction, device orientation, mouth movement, and/or other user features. Then the system initiates processing of a received audio signal based on the audio start event. The system can also identify an audio end event in the image feed and end processing of the received audio signal based on the end event.
申请公布号 US2016189733(A1) 申请公布日期 2016.06.30
申请号 US201615063928 申请日期 2016.03.08
申请人 AT&T INTELLECTUAL PROPERTY I, LP 发明人 VASILIEFF BRANT JAMESON;EHLEN PATRICK JOHN;LIESKE, JR. JAY HENRY
分类号 G10L25/78;G10L15/20;H04N7/18;G10L25/57 主分类号 G10L25/78
代理机构 代理人
主权项
地址 Atlanta GA US