发明名称 Decision tree representation for big data
摘要 A method, system, and process for representing a decision tree in a tabular format is discussed. The format may contain all the necessary information to traverse the nodes in parallel on a distributed system while consuming an efficient amount of resources. In some embodiments, the tree may be stored in a relational database as a table.
申请公布号 US9147168(B1) 申请公布日期 2015.09.29
申请号 US201213722780 申请日期 2012.12.20
申请人 EMC CORPORATION 发明人 Ao Jianwang;Ren Yi;Yang Guangxin;Welton Caleb
分类号 G06F17/00;G06F17/20;G06N99/00 主分类号 G06F17/00
代理机构 代理人 Gould John;Chen Theodore A.;Gupta Krishnendu
主权项 1. A method for representing a decision tree in a table, comprising: receiving a training dataset; building a decision tree from the training dataset, wherein the decision tree comprises a plurality of nodes; storing each node as an individual row in the table on a non-transitory computer readable medium, wherein the row comprises a leftmost child id, a split criterion value (“SCV”), and a path from a root node; distributing the decision tree to multiple nodes in a massive parallel processing (“MPP”) database cluster; receiving a classification dataset to be classified using the decision tree; dividing the classification dataset into a plurality of segments; and distributing the segments to the multiple nodes in the MPP database cluster.
地址 Hopkinton MA US