发明名称 DATA FUSION AND CLASSIFICATION WITH IMBALANCED DATASETS
摘要 Method and system for classification in imbalanced datasets within a supervised classification framework. Bootstrap methodology is modified according to k-Nearest Neighbor sampling weights and adaptive target set size principle, to induce weak classifiers from the bootstrap samples in an iterative procedure that results in a set of weak classifiers. A weighted combination scheme is used to adaptively combine the weak classifiers to a strong classifier that achieves good performance for all classes (reflected as high values for metrics such as G-mean and F-score) as well as good overall accuracy.
申请公布号 US2017032276(A1) 申请公布日期 2017.02.02
申请号 US201514811863 申请日期 2015.07.29
申请人 AGT International GmbH 发明人 SUKHANOV Sergey;MERENTITIS Andreas;DEBES Christian
分类号 G06N99/00;G06F17/30 主分类号 G06N99/00
代理机构 代理人
主权项 1. A method for performing classification in an imbalanced dataset containing a plurality of majority class instances and a plurality of minority class instances, the method comprising: training, by a data processor, a classifier on the imbalanced dataset; estimating, by the data processor, an accuracy ACC for the classifier; sampling, by the data processor, the plurality of majority class instances: iterating, by the data processor, a predetermined number of times, during an iteration of which the data processor performs: sampling to obtain a sample containing a plurality of majority class instances according to k-Nearest Neighbor weighting so that the ratio of a number of minority class instances to a number of majority class instances in the sample equals a predetermined ratio by computation on a previous iteration;training a weak classifier on the sample obtained during the iteration; andcomputing a ratio of a number of minority class instances to a number of majority class instances for a subsequent iteration; and combining, by the data processor, a plurality of weak classifiers from a plurality of iterations into an ensemble aggregation corresponding to a strong classifier, wherein the combining is according to respective weights based on a function of accuracies of the weak classifiers.
地址 Zurich CH