发明名称 SYSTEMS AND METHODS FOR CALCULATING TEXT DIFFICULTY
摘要 Disclosed are systems, methods, and products for language learning that automatically extracts keywords from resources using various natural-language processing product features, which can be combined with custom-designed learning activities to offer a needs-based, adaptive learning methodology. The system may receive resources having text and then determine a text difficulty score that predicts how difficult the resource is for language learners based on any number of factors, including any number of semantic and syntactic features of the text. Training resources labeled with metadata may be used to train a statistical model for determining difficulty scores of newly received text. Resources may be grouped based on difficulty score, and groups of resources may correspond to language learners' proficiency levels.
申请公布号 US2014295384(A1) 申请公布日期 2014.10.02
申请号 US201414180943 申请日期 2014.02.14
申请人 Voxy, Inc. 发明人 NIELSON Katharine;KIRKHAM Kasey;TYSON Na'im;BREEN Andrew
分类号 G09B5/02;G09B19/06 主分类号 G09B5/02
代理机构 代理人
主权项 1. A computer-implemented method for predicting a text difficulty score for a new resource, the method comprising: extracting, by a computer, one or more linguistic features having a weighted value from a plurality of training resources containing text, wherein the text is associated with a metadata label containing a text difficulty score of the text; determining, by the computer, a vector value associated with each training resource based on each of the weighted values of each of the extracted one or more linguistic features; training, by the computer, a statistical model using the vector values associated with each training resource, wherein the statistical model represents a correlation between a set of features selected for extraction, a set of weighted values assigned to the set of features selected for extraction, and a set of text difficulty scores associated with the training resources; extracting, by the computer, one or more linguistic features having a weighted value from a new resource; determining, by the computer, a vector value for the new resource based upon the set of extracted linguistic features; and predicting, by the computer, a text difficulty score for the new resource based upon the vector value for the new resource and the statistical model.
地址 New York NY US