发明名称 Full-sequence training of deep structures for speech recognition
摘要 A method includes an act of causing a processor to access a deep-structured model retained in a computer-readable medium, the deep-structured model includes a plurality of layers with respective weights assigned to the plurality of layers, transition probabilities between states, and language model scores. The method further includes the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.
申请公布号 US9031844(B2) 申请公布日期 2015.05.12
申请号 US201012886568 申请日期 2010.09.21
申请人 Microsoft Technology Licensing, LLC 发明人 Yu Dong;Deng Li;Mohamed Abdel-rahman Samir Abdel-rahman
分类号 G10L15/14;G06K9/62;G06N3/04;G06N3/08 主分类号 G10L15/14
代理机构 代理人 Swain Sandy;Yee Judy;Minhas Micky
主权项 1. A method comprising the following computer-executable acts: accessing a deep belief network (DBN) retained in computer-readable data storage, wherein the DBN comprises: a plurality of stacked hidden layers, each hidden layer comprises a respective plurality of stochastic units, each stochastic unit in each layer connected to stochastic units in an adjacent hidden layer of the DBN by way of connections, the connections assigned weights learned during a pretraining procedure; anda linear-chain conditional random field (CRF), the CRF comprises: a hidden layer that comprises a plurality of stochastic units; anda plurality of output units that are representative of output states, each state in the output states being one of a phone or senone, the plurality of stochastic units connected to the plurality of output units by way of second connections, the second connections having weights learned during the pretraining procedure, the output units have transition probabilities corresponding thereto that are indicative of probabilities of transitioning between output states represented by the output units; and jointly optimizing the weights assigned to the connections, the weights assigned to the second connections, the transition probabilities, and language model scores of the DBN based upon training data, wherein a processor performs the jointly optimizing of the weights.
地址 Redmond WA US