发明名称 System and method for continuous multimodal speech and gesture interaction
摘要 Disclosed herein are systems, methods, and non-transitory computer-readable storage media for processing multimodal input. A system configured to practice the method continuously monitors an audio stream associated with a gesture input stream, and detects a speech event in the audio stream. Then the system identifies a temporal window associated with a time of the speech event, and analyzes data from the gesture input stream within the temporal window to identify a gesture event. The system processes the speech event and the gesture event to produce a multimodal command. The gesture in the gesture input stream can be directed to a display, but is remote from the display. The system can analyze the data from the gesture input stream by calculating an average of gesture coordinates within the temporal window.
申请公布号 US9152376(B2) 申请公布日期 2015.10.06
申请号 US201113308846 申请日期 2011.12.01
申请人 AT&T Intellectual Property I, L.P. 发明人 Johnston Michael;Ozkan Derya
分类号 G10L21/00;G06F3/16;G06F3/01 主分类号 G10L21/00
代理机构 代理人
主权项 1. A method comprising: continuously monitoring an audio stream associated with a non-tactile gesture input stream; identifying a first speech event in the audio stream, the first speech event being from a first user; identifying a second speech event in the audio stream, the second speech event being from a second user; identifying a temporal window associated with times of the first speech event and the second speech event, wherein the temporal window extends forward and backward from the times of the first speech event and the second speech event; analyzing, via a processor, data from the non-tactile gesture input stream within the temporal window to identify a non-tactile gesture event; and processing the first speech event, the second speech event, and the non-tactile gesture event to produce a single multimodal command.
地址 Atlanta GA US
您可能感兴趣的专利