发明名称 PHONETIC ALIGNMENT FOR USER-AGENT DIALOGUE RECOGNITION
摘要 A method for speech to text transcription uses a knowledge base containing solution descriptions, each describing, in words, a solution to a respective problem. An audio recording of a dialogue between an agent and a user in which the agent had access to the knowledge base is received. A sequence of phonemes based on the agent's part of the audio recording is identified and from this, a preliminary transcription is made which includes a sequence of words recognized as corresponding to phonemes in the identified sequence of phonemes together with any unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words. The preliminary transcription is revised by replacing one or more of the unrecognized phonemes with a word or words from a solution description that includes words which match adjacent words of the sequence of recognized words.
申请公布号 US2015058006(A1) 申请公布日期 2015.02.26
申请号 US201313974515 申请日期 2013.08.23
申请人 Xerox Corporation 发明人 Proux Denys
分类号 G10L15/26 主分类号 G10L15/26
代理机构 代理人
主权项 1. A method for speech to text transcription comprising: providing access to a knowledge base containing solution descriptions, each solution description including a textual description of a solution to a respective problem; generating a preliminary transcription of at least an agent's part of an audio recording of a dialogue between the agent and a user in which the agent had access to the knowledge base, the generating comprising: identifying a sequence of phonemes based on the agent's part of the audio recording, andbased on the identified sequence of phonemes, generating the preliminary transcription, the preliminary transcription including a sequence of words recognized as corresponding to phonemes in the sequence of phonemes and unrecognized phonemes from the phoneme sequence that are not recognized as corresponding to one of the recognized words; and revising the preliminary transcription, the revising comprising replacement of unrecognized phonemes with at least one word from a solution description, the solution description including words which match words of the sequence of recognized words, wherein at least one of the generating of the preliminary transcription and the revising of the preliminary transcription is performed with a processor.
地址 Norwalk CT US