摘要 |
<p>The invention relates to a method of creating decision trees or regression trees for machine learning applications. The process of training the trees effectively uses a parallel computation including multiple computer processors in growing ensemble tree models. More specifically the invention is characterised by using processing units which have associated storage units comprising a data slice and a database management system operable to execute a method for growing multiple trees. The method comprising: creating subsets (or data bags) from a training dataset for training each of the trees to be grown, splitting the training set into disjoint data sub-sets and storing them in the data slices, creating root nodes for the trees, assigning data records of the bags to the root nodes of the trees to be grown and growing the trees iteratively wherein each iteration generates a node level of the ensemble of trees by passing through all of the data records in all slices by processing each slice in parallel.</p> |