发明名称 |
Computer-implemented systems and methods for content scoring of spoken responses |
摘要 |
Systems and methods are provided for scoring a non-scripted speech sample. A system includes one or more data processors and one or more computer-readable mediums. The computer-readable mediums are encoded with a non-scripted speech sample data structure, where the non-scripted speech sample data structure includes: a speech sample identifier that identifies a non-scripted speech sample, a content feature extracted from the non-scripted speech sample, and a content-based speech score for the non-scripted speech sample. The computer-readable mediums further include instructions for commanding the one or more data processors to extract the content feature from a set of words automatically recognized in the non-scripted speech sample and to score the non-scripted speech sample by providing the extracted content feature to a scoring model to generate the content-based speech score. |
申请公布号 |
US9218339(B2) |
申请公布日期 |
2015.12.22 |
申请号 |
US201213688306 |
申请日期 |
2012.11.29 |
申请人 |
Educational Testing Service |
发明人 |
Zechner Klaus;Evanini Keelan;Chen Lei;Xie Shasha;Xiong Wenting;Huang Fei;Sukkarieh Jana;Chen Miao |
分类号 |
G06F17/27;G06F17/28;G10L15/18;G09B19/06;G10L15/08 |
主分类号 |
G06F17/27 |
代理机构 |
Jones Day |
代理人 |
Jones Day |
主权项 |
1. A computer-implemented method of scoring a non-scripted speech sample, comprising:
extracting, using a processing system, a content feature from a set of words automatically recognized in the non-scripted speech sample; and scoring, using the processing system, the non-scripted speech sample by providing the extracted content feature to a content scoring model to generate a content-based speech score, the content scoring model comparing the extracted content feature to one or more score-level training vectors of a training corpus to generate the content-based speech score, the score-level training vectors having been determined from partitioning a set of scored, transcribed speech samples of the training corpus into sub-sets, with each of the sub-sets containing speech samples with identical scores. |
地址 |
Princeton NJ US |