发明名称 Systems and methods for adaptive proper name entity recognition and understanding
摘要 Various embodiments contemplate systems and methods for performing automatic speech recognition (ASR) and natural language understanding (NLU) that enable high accuracy recognition and understanding of freely spoken utterances which may contain proper names and similar entities. The proper name entities may contain or be comprised wholly of words that are not present in the vocabularies of these systems as normally constituted. Recognition of the other words in the utterances in question—e.g., words that are not part of the proper name entities—may occur at regular, high recognition accuracy. Various embodiments provide as output not only accurately transcribed running text of the complete utterance, but also a symbolic representation of the meaning of the input, including appropriate symbolic representations of proper name entities, adequate to allow a computer system to respond appropriately to the spoken request without further analysis of the user's input.
申请公布号 US9449599(B2) 申请公布日期 2016.09.20
申请号 US201414292800 申请日期 2014.05.30
申请人 PROMPTU SYSTEMS CORPORATION 发明人 Printz Harry William
分类号 G10L15/00;G10L15/19;G10L15/32;G06F17/27;G10L15/22 主分类号 G10L15/00
代理机构 Perkins Coie LLP 代理人 Glenn Michael A.;Perkins Coie LLP
主权项 1. A computer-implemented method for recognizing and understanding spoken commands that include one or more proper name entities, comprising: receiving an utterance from a user; performing primary automatic speech recognition (ASR) processing upon said utterance with a primary automatic speech recognizer to output a dataset comprising at least a sequence of nominal transcribed words and putative start and end times for each nominal transcribed word within said utterance; performing understanding processing upon said dataset with a natural language understanding (NLU) processor to generate and augment the dataset with a nominal meaning for the utterance and to determine putative presence and type of one or more spoken proper name entities within said utterance, wherein a contiguous section of audio within said utterance corresponding to each putative proper name entity, as determined from said start and end times of the words of the putative proper name entity as transcribed by the primary automatic speech recognizer, comprises an acoustic span; performing secondary automatic speech recognition (ASR) processing upon each said acoustic span with a secondary automatic speech recognizer, in each instance said secondary automatic speech recognizer specialized to process a given putative type of acoustic span to generate a nominal correct transcription and associated meaning for each said acoustic span; substituting the nominal correct transcription and associated meaning obtained from each secondary recognition as appropriate within the dataset to revise the results of the primary automatic speech recognizer and natural language understanding processor; and outputting a complete and accurate transcription and meaning for the entire utterance.
地址 Menlo Park CA US