发明名称 Speaker and call characteristic sensitive open voice search
摘要 Techniques disclosed herein include systems and methods for open-domain voice-enabled searching that is speaker sensitive. Techniques include using speech information, speaker information, and information associated with a spoken query to enhance open voice search results. This includes integrating a textual index with a voice index to support the entire search cycle. Given a voice query, the system can execute two matching processes simultaneously. This can include a text matching process based on the output of speech recognition, as well as a voice matching process based on characteristics of a caller or user voicing a query. Characteristics of the caller can include output of voice feature extraction and metadata about the call. The system clusters callers according to these characteristics. The system can use specific voice and text clusters to modify speech recognition results, as well as modifying search results.
申请公布号 US9099092(B2) 申请公布日期 2015.08.04
申请号 US201414152136 申请日期 2014.01.10
申请人 Nuance Communications, Inc. 发明人 Zhang Shilei;Bao Shenghua;Liu Wen;Qin Yong;Shuang Zhiwei;Chen Jian;Su Zhong;Shi Qin;Ganong, III William F.
分类号 G10L15/26;G10L15/00;G10L15/28;G10L15/04;G10L15/14;G10L15/16;G10L15/18;G10L15/20;G10L25/00;G10L21/00;G06F7/00;G06F17/30;G10L15/22;G06F17/27;G10L15/183 主分类号 G10L15/26
代理机构 Banner & Witcoff, Ltd. 代理人 Banner & Witcoff, Ltd.
主权项 1. A method comprising: receiving, by a microphone of a computing device, a spoken query from a user; processing, by the computing device, the spoken query using parallel processes, wherein a first process of the parallel processes comprises: converting, by the computing device, the spoken query into one or more text strings using a speech recognition process; andassigning, using an initial language model, a score to each of the one or more text strings, the score of each of the one or more text strings being used to compute a probability of correct conversion of the spoken query into the one or more text strings; wherein a second process of the parallel processes comprises: identifying, by the computing device, acoustic features of a voice signal corresponding to the spoken query; andclassifying, by the computing device, the spoken query into at least one voice cluster based on the identified acoustic features of the voice signal, the at least one voice cluster having a respective text cluster and a customized language model that reflects characteristics of the user; selecting a text query based on the one or more text strings and the customized language model. receiving, by the computing device, search results from an information retrieval system based on the text query, each of the search results having a ranking indicating a measure of importance relative to other of the search results; and re-ranking, by the computing device, the search results based on re-scoring the search results using the respective text cluster.
地址 Burlington MA US