发明名称 Configurable speech recognition system using multiple recognizers
摘要 Techniques for combining the results of multiple recognizers in a distributed speech recognition architecture. Speech data input to a client device is encoded and processed both locally and remotely by different recognizers configured to be proficient at different speech recognition tasks. The client/server architecture is configurable to enable network providers to specify a policy directed to a trade-off between reducing recognition latency perceived by a user and usage of network resources. The results of the local and remote speech recognition engines are combined based, at least in part, on logic stored by one or more components of the client/server architecture.
申请公布号 US8898065(B2) 申请公布日期 2014.11.25
申请号 US201213345265 申请日期 2012.01.06
申请人 Nuance Communications, Inc. 发明人 Newman Michael;Gillet Anthony;Krowitz David Mark;Edgington Michael D.
分类号 G10L15/00 主分类号 G10L15/00
代理机构 Wolf, Greenfield & Sacks, P.C. 代理人 Wolf, Greenfield & Sacks, P.C.
主权项 1. A method of performing speech recognition in a distributed system comprising an electronic device including an embedded speech recognizer and a network device including a remote speech recognizer remote from the electronic device, the method comprising: receiving, by the electronic device, input audio comprising speech; determining that at least a portion of the input audio matches a recognition grammar associated with the embedded speech recognizer; generating a search tree associated with the recognition grammar, wherein the search tree includes a plurality of nodes, wherein each of the nodes is associated with a type of item in the recognition grammar; determining whether the recognition grammar includes at least one generic speech nod, wherein determining whether the recognition grammar includes at least one generic speech node comprises determining whether the search tree includes only nodes associated with types of items that can be recognized by the embedded speech recognizer with an accuracy above a threshold; determining that recognition by the remote speech recognizer is desired in response to determining that the recognition grammar includes at least one generic speech node indicating that the speech in the input audio may include free-form dictation; and sending at least a portion of the input audio to the network device in response to determining that recognition by the remote speech recognizer is desired.
地址 Burlington MA US