发明名称 |
Enhancing classification and prediction using predictive modeling |
摘要 |
In one embodiment, a system for enhancing predictive modeling includes an interface operable to receive a first dataset. The system may also include a processor communicatively coupled to the interface that is operable to generate a holdout dataset based on the first dataset. The processor may also train each of a plurality of boosting models in parallel using the first dataset, wherein for each of a number of iterations, training comprises: building a one-level binary decision tree to train a split-node variable; calculating an impurity of the split-node variable; and calculating an optimal split node, wherein the optimal split node is the split-node variable with a lowest impurity between the plurality of boosting models. The system may then determine a final model based on one of the plurality of boosting models that provides the lowest error rate when applied to the holdout dataset. |
申请公布号 |
US9171259(B1) |
申请公布日期 |
2015.10.27 |
申请号 |
US201514594600 |
申请日期 |
2015.01.12 |
申请人 |
Bank of America Corporation |
发明人 |
Laxmanan Kasilingam Basker;Chen Yudong;Song Peng |
分类号 |
G06F15/18;G06N5/02;G06N99/00;G06F17/50 |
主分类号 |
G06F15/18 |
代理机构 |
|
代理人 |
Springs Michael A. |
主权项 |
1. A method for enhancing predictive modeling, comprising:
receiving, at an interface, a first dataset; generating, with a processor, a holdout dataset based on the first dataset; training, with the processor, each of a plurality of boosting models in parallel using the first data set, wherein for each of a number of iterations, training comprises:
building a one-level binary decision tree to train a split-node variable;calculating an impurity of the split-node variable;calculating an optimal split node, wherein the optimal split node is the split-node variable with a lowest impurity between the plurality of boosting models; andcalculating a total variance for the split node variable; determining a final model, based on one of the plurality of boosting models that provides a lowest error rate when applied to the holdout dataset. |
地址 |
Charlotte NC US |