发明名称 |
Decision tree representation for big data |
摘要 |
A method, system, and process for representing a decision tree in a tabular format is discussed. The format may contain all the necessary information to traverse the nodes in parallel on a distributed system while consuming an efficient amount of resources. In some embodiments, the tree may be stored in a relational database as a table. |
申请公布号 |
US9147168(B1) |
申请公布日期 |
2015.09.29 |
申请号 |
US201213722780 |
申请日期 |
2012.12.20 |
申请人 |
EMC CORPORATION |
发明人 |
Ao Jianwang;Ren Yi;Yang Guangxin;Welton Caleb |
分类号 |
G06F17/00;G06F17/20;G06N99/00 |
主分类号 |
G06F17/00 |
代理机构 |
|
代理人 |
Gould John;Chen Theodore A.;Gupta Krishnendu |
主权项 |
1. A method for representing a decision tree in a table, comprising:
receiving a training dataset; building a decision tree from the training dataset, wherein the decision tree comprises a plurality of nodes; storing each node as an individual row in the table on a non-transitory computer readable medium, wherein the row comprises a leftmost child id, a split criterion value (“SCV”), and a path from a root node; distributing the decision tree to multiple nodes in a massive parallel processing (“MPP”) database cluster; receiving a classification dataset to be classified using the decision tree; dividing the classification dataset into a plurality of segments; and distributing the segments to the multiple nodes in the MPP database cluster. |
地址 |
Hopkinton MA US |