摘要 |
Disclosed are systems, methods, and products for language learning that automatically extracts keywords from resources using various natural-language processing product features, which can be combined with custom-designed learning activities to offer a needs-based, adaptive learning methodology. The system may receive resources having text and then determine a text difficulty score that predicts how difficult the resource is for language learners based on any number of factors, including any number of semantic and syntactic features of the text. Training resources labeled with metadata may be used to train a statistical model for determining difficulty scores of newly received text. Resources may be grouped based on difficulty score, and groups of resources may correspond to language learners' proficiency levels. |
主权项 |
1. A computer-implemented method for predicting a text difficulty score for a new resource, the method comprising:
extracting, by a computer, one or more linguistic features having a weighted value from a plurality of training resources containing text, wherein the text is associated with a metadata label containing a text difficulty score of the text; determining, by the computer, a vector value associated with each training resource based on each of the weighted values of each of the extracted one or more linguistic features; training, by the computer, a statistical model using the vector values associated with each training resource, wherein the statistical model represents a correlation between a set of features selected for extraction, a set of weighted values assigned to the set of features selected for extraction, and a set of text difficulty scores associated with the training resources; extracting, by the computer, one or more linguistic features having a weighted value from a new resource; determining, by the computer, a vector value for the new resource based upon the set of extracted linguistic features; and predicting, by the computer, a text difficulty score for the new resource based upon the vector value for the new resource and the statistical model. |