发明名称 Load balancing for automatic speech recognition
摘要 Features are disclosed for transferring speech recognition workloads between pooled execution resources. For example, various parts of an automatic speech recognition engine may be implemented by various pools of servers. Servers in a speech recognition pool may explore a plurality of paths in a graph to find the path that best matches an utterance. A set of active nodes comprising the last node explored in each path may be transferred between servers in the pool depending on resource availability at each server. A history of nodes or arcs traversed in each path may be maintained by a separate pool of history servers, and used to generate text corresponding to the path identified as the best match by the speech recognition servers.
申请公布号 US9269355(B1) 申请公布日期 2016.02.23
申请号 US201313831286 申请日期 2013.03.14
申请人 Amazon Technologies, Inc. 发明人 Secker-Walker Hugh Evan;Narayanan Naresh
分类号 G10L15/00;G10L15/22 主分类号 G10L15/00
代理机构 Knobbe Martens Olson & Bear, LLP 代理人 Knobbe Martens Olson & Bear, LLP
主权项 1. A method of sharing a speech recognition workload among a plurality of servers, the method comprising: performing processing at a first speech recognition server, wherein the performing comprises: receiving, at the first speech recognition server, first feature data determined from an audio stream;accessing, in a memory of the first speech recognition server, a first data structure corresponding to a directed graph, the graph comprising a plurality of nodes and a plurality of arcs;determining, at the first speech recognition server, a second set of active nodes from a first set of active nodes, wherein the first set of active nodes are among the plurality of nodes, and wherein determining the second set of active nodes comprises computing at least one first score using the first feature data and a first arc of the plurality of arcs; determining that a processing load for the first speech recognition server has exceeded a threshold; transmitting, from the first speech recognition server to a second speech recognition server, information indicating the second set of active nodes; performing processing at a second speech recognition server, wherein the performing comprises: receiving, at the second speech recognition server, second feature data determined from the audio stream;accessing, in a memory of the second speech recognition server, a second data structure corresponding to the directed graph;determining, at the second speech recognition server, a third set of active nodes from the second set of active nodes, wherein determining the third set of active nodes comprises computing at least one second score using the second feature data and a second arc of the plurality of arcs; and determining speech recognition results using information relating to a first node of the first set of nodes, a second node of the second set of nodes, and a third node of the third set of nodes.
地址 Reno NV US