发明名称 METHODS AND SYSTEMS FOR USING CLUSTERING FOR SPLITTING TREE NODES IN CLASSIFICATION DECISION TREES
摘要 Systems and methods for determining an optimal splitting scheme for a node in a classification decision tree. A computing system may receive input data related to a decision tree to be generated from a data set. The input data identifies a target attribute of the data set and a set of candidate attributes of the data set to be used as nodes in the decision tree. The computing system may determine, using a clustering algorithm and the set of candidate attributes, a number of potential splitting schemes to be used to split a node in the decision tree. The computing system may calculate a splitting measurement for each of the plurality of potential splitting schemes. The computing system may select an optimal splitting scheme from the plurality of potential splitting schemes for each node in the decision tree based on the splitting measurement.
申请公布号 US2014351196(A1) 申请公布日期 2014.11.27
申请号 US201414284222 申请日期 2014.05.21
申请人 SAS Institute Inc. 发明人 Hu Xiangqian;Wu Xunlei;Meng Xiangxiang;Schabenberger Oliver
分类号 G06N5/04;G06F17/30 主分类号 G06N5/04
代理机构 代理人
主权项 1. A computer-program product tangibly embodied in a non-transitory machine-readable storage medium, including instructions configured to be executed to cause a data processing apparatus to: receive input data related to a decision tree to be generated from a data set, wherein the input data identifies a target attribute of the data set and a set of candidate attributes of the data set to be used as nodes in the decision tree; determine, using a clustering algorithm and the set of candidate attributes, a plurality of potential splitting schemes to be used to split a node in the decision tree; calculate a splitting measurement for each of the plurality of potential splitting schemes; and select an optimal splitting scheme from the plurality of potential splitting schemes for each node in the decision tree based on the splitting measurement.
地址 Cary NC US