摘要 |
Systems and methods are provided for scoring speech. A speech sample is received, where the speech sample is associated with a script. The speech sample is aligned with the script. An event recognition metric of the speech sample is extracted, and locations of prosodic events are detected in the speech sample based on the event recognition metric. The locations of the detected prosodic events are compared with locations of model prosodic events, where the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script. A prosodic event metric is calculated based on the comparison, and the speech sample is scored using a scoring model based upon the prosodic event metric. |
主权项 |
1. A computer-implemented method of scoring speech, comprising:
receiving a speech sample, wherein the speech sample is based upon speaking from a script; aligning, using a processing system, the speech sample with the script; extracting, using the processing system, an event recognition metric of the speech sample; detecting, using the processing system, locations of prosodic events in the speech sample based on the event recognition metric; comparing, using the processing system, the locations of the detected prosodic events with locations of model prosodic events, wherein the locations of model prosodic events identify expected locations of prosodic events of a fluent, native speaker speaking the script, and wherein the comparing comprises comparing a first data structure for the model prosodic events and a second data structure for the detected prosodic events, the first data structure and the second data structure including binary data per syllable representing whether or not a syllable exhibits a stress and whether or not the syllable exhibits a tone change, said comparing including comparing per syllable the binary data representing stress and the binary data representing tone change for the model prosodic events and the detected prosodic events; calculating, using the processing system, a prosodic event metric based on the comparison; and scoring, using the processing system, the speech sample using a scoring model based upon the prosodic event metric. |