主权项 |
1. A method for performing classification in an imbalanced dataset containing a plurality of majority class instances and a plurality of minority class instances, the method comprising:
training, by a data processor, a classifier on the imbalanced dataset; estimating, by the data processor, an accuracy ACC for the classifier; sampling, by the data processor, the plurality of majority class instances: iterating, by the data processor, a predetermined number of times, during an iteration of which the data processor performs:
sampling to obtain a sample containing a plurality of majority class instances according to k-Nearest Neighbor weighting so that the ratio of a number of minority class instances to a number of majority class instances in the sample equals a predetermined ratio by computation on a previous iteration;training a weak classifier on the sample obtained during the iteration; andcomputing a ratio of a number of minority class instances to a number of majority class instances for a subsequent iteration; and combining, by the data processor, a plurality of weak classifiers from a plurality of iterations into an ensemble aggregation corresponding to a strong classifier, wherein the combining is according to respective weights based on a function of accuracies of the weak classifiers. |