发明名称 Systems and methods for improving the accuracy of a transcription using auxiliary data such as personal data
摘要 A method is described for improving the accuracy of a transcription generated by an automatic speech recognition (ASR) engine. A personal vocabulary is maintained that includes replacement words. The replacement words in the personal vocabulary are obtained from personal data associated with a user. A transcription is received of an audio recording. The transcription is generated by an ASR engine using an ASR vocabulary and includes a transcribed word that represents a spoken word in the audio recording. Data is received that is associated with the transcribed word. A replacement word from the personal vocabulary is identified, which is used to re-score the transcription and replace the transcribed word.
申请公布号 US9009041(B2) 申请公布日期 2015.04.14
申请号 US201113190749 申请日期 2011.07.26
申请人 Nuance Communications, Inc. 发明人 Zavaliagkos George;Ganong, III William F.;Jost Uwe H.;Madhavapeddi Shreedhar;Clayton Gary B.
分类号 G10L15/00;G10L15/26;G10L15/24;G10L15/22;G10L15/08;G10L15/30 主分类号 G10L15/00
代理机构 Perkins Coie LLP 代理人 Perkins Coie LLP
主权项 1. A personal computing device for use with a remote automatic speech recognition engine, the device comprising: a communications port configured to receive a data set and audio data from the remote automatic speech recognition engine, wherein the data set and the audio data reflect speech,wherein the data set is a rich data set that includes a word list for candidate words with confidence scores, andwherein the data set is generated by the remote automatic speech recognition engine in response to the audio data; a display device for displaying information to a user; memory for at least temporarily storing personal data and executable code for a re-recognition engine, wherein the re-recognition engine includes automatic speech recognition capability; and at least one processor coupled among the communications port, the display device, and the memory, wherein the at least one processor is configured to execute the code for the re-recognition engine and— access the personal data from the memory,generate a local transcription using the audio data, wherein the local transcription is generated using the speech recognition capability of the re-recognition engine and the accessed personal data,rescore the data set received from the remote automatic speech recognition engine, using the re-recognition engine, based on the accessed personal data and confidence scores associated with the local transcription,generate a final transcription of the speech using the rescored data set and the local transcription,present, via the display device, the final transcription of the speech to the user based on the rescored data set and local transcription, andcreate a rule that a particular word in the data set from the remote automatic speech recognition engine is to be replaced by a particular replacement word from the local transcription, and transmit, via the communications port, the rule or the rescored data set to the remote automatic speech recognition engine from which a vocabulary of the remote automatic speech recognition engine is modified,wherein the remote automatic speech recognition engine is hosted by a server accessible via a network, and the personal computing device is a cell phone, smart phone, tablet or portable telecommunications device.
地址 Burlington MA US