发明名称 Using visual cues to disambiguate speech inputs
摘要 Embodiments related to recognizing speech inputs are disclosed. One disclosed embodiment provides a method for recognizing a speech input including receiving depth information of a physical space from a depth camera, determining an identity of a user in the physical space based on the depth information, receiving audio information from one or more microphones, and determining a speech input from the audio input. If the speech input comprises an ambiguous term, the ambiguous term in the speech input is compared to one or more of depth image data received from the depth image sensor and digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term. After identifying the unambiguous term, an action is taken on the computing device based on the speech input and the unambiguous term.
申请公布号 US9190058(B2) 申请公布日期 2015.11.17
申请号 US201313750674 申请日期 2013.01.25
申请人 MICROSOFT TECHNOLOGY LICENSING, LLC 发明人 Klein Christian
分类号 G10L15/00;G10L17/00;G10L21/00;G10L25/00;G10L15/22;G06F3/16;G06F3/01;G06F3/03;G10L15/24 主分类号 G10L15/00
代理机构 代理人 Chatterjee Aaron;Yee Judy;Minhas Micky
主权项 1. On a computing device, a method for recognizing a speech input, the method comprising: receiving image information of a physical space from a one or more cameras; determining an identity of a user in the physical space based on the image information; receiving audio information from one or more microphones; determining a speech input from the audio input; if the speech input comprises an ambiguous term, then comparing the ambiguous term in the speech input to digital content consumption information for the user to identify an unambiguous term corresponding to the ambiguous term, the digital content consumption information comprising social network information obtained from a remote service, the social network information including contacts from a social network, and wherein identifying the unambiguous term comprises identifying another user from the social network information; and after identifying the unambiguous term, taking an action on the computing device based on the speech input and the unambiguous term.
地址 Redmond WA US