发明名称 Assessing speech prosody
摘要 A method, system and computer readable storage medium for assessing speech prosody. The method includes the steps of: receiving input speech data; acquiring a prosody constraint; assessing prosody of the input speech data according to the prosody constraint; and providing assessment result where at least of the steps is carried out using a computer device.
申请公布号 US9368126(B2) 申请公布日期 2016.06.14
申请号 US201113097191 申请日期 2011.04.29
申请人 Nuance Communications, Inc. 发明人 Qin Yong;Shi Qin;Shuang Zhiwei;Zhang Shi Lei
分类号 G10L25/48 主分类号 G10L25/48
代理机构 Banner & Witcoff, Ltd. 代理人 Banner & Witcoff, Ltd.
主权项 1. A method for assessing speech prosody, comprising: receiving, by a computing device, spoken speech, the spoken speech being converted into input speech data representing the spoken speech; processing, by the computing device, the input speech data to acquire an input language structure that corresponds to the input speech data and that represents part of speech role of words of the spoken speech; obtaining, from a corpus of standard speech data comprising at least one example of standard speech data having a matching language structure as at least a portion of the input speech data, a language structure of standard speech; traversing a decision tree that corresponds to the language structure of standard speech based on at least a portion of the input language structure to identify, for a word in the input language structure, an occurrence probability of phrase boundary location at the word, wherein a leaf node of the decision tree identifies a determined occurrence probability of phrase boundary location for a part of speech based on a first adjacent part of speech to the left of the part of speech and a second adjacent part of speech to the right of the part of speech; acquiring a rhythm feature and a fluency feature of the input speech data based, at least in part, on the occurrence probability of phrase boundary location for the word; acquiring, from the corpus of standard speech data, a prosody constraint based on the rhythm feature and the fluency feature; assessing prosody of the input speech data according to the prosody constraint; providing an assessment result based on the prosody constraint; and the corpus of standard speech data or outputting reference speech that indicates a correct way to say the spoken speech.
地址 Burlington MA US