发明名称 Transductive feature selection with maximum-relevancy and minimum-redundancy criteria
摘要 Various embodiments select features from a feature space. In one embodiment, a set of training samples and a set of test samples are received. The set of training samples includes a set of features and a class value. The set of test samples includes the set of features absent the class value. A relevancy with respect to the class value is determined for each of a plurality of unselected features based on the set of training samples. A redundancy with respect to one or more of the set of features is determined for each of the plurality of unselected features in the first set of features based on the set of training samples and the set of test samples. A set of features is selected from the plurality of unselected features based on the relevancy and the redundancy determined for each of the plurality of unselected features.
申请公布号 US9471881(B2) 申请公布日期 2016.10.18
申请号 US201313745930 申请日期 2013.01.21
申请人 International Business Machines Corporation 发明人 Haws David;He Dan;Parida Laxmi P.;Rish Irina
分类号 G06F15/18;G06N99/00;G06F17/18;G06N7/00;G06N5/02;G06K9/62;G06F19/24 主分类号 G06F15/18
代理机构 Fleit Gibbons Gutman Bongini Bianco PL 代理人 Fleit Gibbons Gutman Bongini Bianco PL ;Grzesik Thomas
主权项 1. A computer implemented method for selecting features from a feature space, the computer implemented method comprising: obtaining, by a processor, a set of training samples and a set of test samples, wherein the set of training samples comprises a first set of features and a class value, and wherein the set of test samples comprises a second set of features, where the second set of features is the first set of features absent the class value; determining, for each of a plurality of unselected features in a plurality of features comprising the first and second set of features, a relevancy with respect to the class value based on only the set of training samples; determining, for each of the plurality of unselected features, a redundancy with respect to the plurality of features based on both the set of training samples and the set of test samples; selecting a set of features from the plurality of unselected features based on the relevancy and the redundancy determined for each of the plurality of unselected features, wherein the selecting is performed based onmaxxj∈X-Sm-1⁢[I⁡(xjtraining;ctraining)-1m-1⁢∑xi∈Sm-1⁢I⁡(xjtraining+test;xitraining+test)],where xj is a jth feature that is sample independent, xjtraining is a jth feature based on the set of training samples, xjtraining+test is a jth feature based on the set of training samples and the set of test samples, i is an integer, X is a set of all features, Sm-1 is a set of m−1 features, c is the class value, and I is mutual information; and programming a processor to perform at least one of a set of classification operations and a set of regression operations based on the set of features that have been selected.
地址 Armonk NY US