发明名称 Embedded system for construction of small footprint speech recognition with user-definable constraints
摘要 Techniques disclosed herein include systems and methods that enable a voice trigger that wakes-up an electronic device or causes the device to make additional voice commands active, without manual initiation of voice command functionality. In addition, such a voice trigger is dynamically programmable or customizable. A speaker can program or designate a particular phrase as the voice trigger. In general, techniques herein execute a voice-activated wake-up system that operates on a digital signal processor (DSP) or other low-power, secondary processing unit of an electronic device instead of running on a central processing unit (CPU). A speech recognition manager runs two speech recognition systems on an electronic device. The CPU dynamically creates a compact speech system for the DSP. Such a compact system can be continuously run during a standby mode, without quickly exhausting a battery supply.
申请公布号 US9117449(B2) 申请公布日期 2015.08.25
申请号 US201213456959 申请日期 2012.04.26
申请人 Nuance Communications, Inc. 发明人 Newman Michael Jack;Roth Robert;Alexander William D.;van Mulbregt Paul
分类号 G10L15/14;G10L21/00;G10L15/22;H04M1/725;G10L15/32 主分类号 G10L15/14
代理机构 Holland & Knight LLP 代理人 Holland & Knight LLP ;Whittenberger, Esq. Mark H.
主权项 1. A computer-implemented method for managing speech recognition, the computer implemented method comprising: receiving configuration input at a voice-activated wake-up function of an electronic device, the configuration input including a trigger phrase, the configuration input being received at a first processor of the electronic device, the electronic device having a second processor in addition to the first processor, wherein the configuration input and at least one decoy word is first received at the second processor; creating a finite state transducer network including a network of speech recognition states corresponding to the trigger phrase, the network of speech recognition states being created at the first processor using a first speech recognition engine that the first processor executes, wherein the trigger phrase is evaluated using a recognition grammar without applying a vocabulary model; transferring the network of speech recognition states from the first processor to the second processor; and executing a second speech recognition engine on the second processor using the network of speech recognition states corresponding to the trigger phrase, the second processor executing the second speech recognition engine while the first speech recognition engine of the first processor is in an inactive state, wherein the second speech recognition engine includes a finite state transducer decoder configured to execute the finite state transducer network; wherein the first processor is a central processing unit, and wherein the second processor is a digital signal processor and wherein the first processor and the second processor are co-located within a mobile telephone.
地址 Burlington MA US