主权项 |
1. A method comprising:
determining to improve an effectiveness measure of a first trained classification model, wherein the first trained model is trained using a set of training documents; selecting a plurality of unlabeled documents, wherein the plurality of unlabeled documents are not part of the set of training documents used to train the first trained classification model; generating a support vector based on a determination that one or more of the plurality of unlabeled documents are within a margin of a decision hyperplane associated with the first trained classification model; calculating, by a processor in a predictive coding system, an overall score for each unlabeled document of the plurality of unlabeled documents based on a distance of a respective unlabeled document to the decision hyperplane and an angle diversity of the respective unlabeled document; comparing, by the processor in the predictive coding system, the overall scores of the unlabeled documents to each other to select a pre-determined number of unlabeled documents having lowest scores in the plurality of unlabeled documents; updating, by the processor in the predictive coding system, the set of training documents used to train the first trained classification model by adding the pre-determined number of unlabeled documents having the lowest scores in the plurality of unlabeled documents to the set of training documents; updating the decision hyperplane based on the support vector; providing, by the predictive coding system, the updated set of training documents to the first trained classification model to improve the effectiveness measure of the first trained classification model by generating a second trained classification model from the updated set of training documents; identifying an effectiveness measure of the second trained classification model; and generating a third trained classification model based on a determination that the effectiveness measure of the second trained classification model has improved from the effectiveness measure of the first trained classification model. |