发明名称 Computer-implemented systems and methods for evaluating prosodic features of speech
摘要 Systems and methods are provided for scoring speech. A speech sample is received, where the speech sample is associated with a script. The speech sample is aligned with the script. An event recognition metric of the speech sample is extracted, and locations of prosodic events are detected in the speech sample based on the event recognition metric. The locations of the detected prosodic events are compared with locations of model prosodic events, where the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script. A prosodic event metric is calculated based on the comparison, and the speech sample is scored using a scoring model based upon the prosodic event metric.
申请公布号 US9087519(B2) 申请公布日期 2015.07.21
申请号 US201213424643 申请日期 2012.03.20
申请人 Educational Testing Service 发明人 Zechner Klaus;Xi Xiaoming
分类号 G10L15/00;G10L25/03;G10L25/90;G10L13/10 主分类号 G10L15/00
代理机构 Jones Day 代理人 Jones Day
主权项 1. A computer-implemented method of scoring speech, comprising: receiving a speech sample, wherein the speech sample is based upon speaking from a script; aligning, using a processing system, the speech sample with the script; extracting, using the processing system, an event recognition metric of the speech sample; detecting, using the processing system, locations of prosodic events in the speech sample based on the event recognition metric; comparing, using the processing system, the locations of the detected prosodic events with locations of model prosodic events, wherein the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script, and wherein the comparing comprises comparing a first data structure for the model prosodic events and a second data structure for the detected prosodic events, the first data structure and the second data structure including binary data per syllable representing whether or not a syllable exhibits a stress and whether or not the syllable exhibits a tone change, said comparing including comparing per syllable the binary data representing stress and the binary data representing tone change for the model prosodic events and the detected prosodic events; calculating, using the processing system, a prosodic event metric based on the comparison; and scoring, using the processing system, the speech sample using a scoring model based upon the prosodic event metric.
地址 Princeton NJ US