发明名称 Method and system for generating a decision-tree classifier in parallel in a multi-processor system
摘要 A method and system are disclosed for generating a decision-tree classifier in parallel in a multi-processor system, from a training set of records. The method comprises the steps of: partitioning the records among the processors, each processor generating an attribute list for each attribute, and the processors cooperatively generating a decision tree by repeatedly partitioning the records using the attribute lists. For each node, each processor determines its best split test and, along with other processors, selects the best overall split for the records at that node. Preferably, the gini-index and class histograms are used in determining the best splits. Also, each processor builds a hash table using the attribute list of the split attribute and shares it with other processors. The hash tables are used for splitting the remaining attribute lists. The created tree is then pruned based on the MDL principle, which encodes the tree and split tests in an MDL-based code, and determines whether to prune and how to prune each node based on the code length of the node.
申请公布号 US6138115(A) 申请公布日期 2000.10.24
申请号 US19990245765 申请日期 1999.02.05
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 AGRAWAL, RAKESH;MEHTA, MANISH;SHAFER, JOHN CHRISTOPHER
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址