发明名称 Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records
摘要 A method and apparatus are disclosed for generating a decision tree classifier from a training set of records. The method comprises the steps of: pre-sorting the records based on each numeric record attribute, creating a decision tree breadth-first, and pruning the tree based on the MDL principle. Preferably, the pre-sorting includes generating a class list and attribute lists, and independently sorting the numeric attribute lists. The growing of the tree includes evaluating possible splitting criteria and selecting a splitting test for each leaf node, based on a splitting index, and updating the class list to reflect new leaf nodes. In a preferred embodiment, the splitting index is a gini index. The pruning preferably includes encoding the decision tree and splitting tests in an MDL-based code, and determining whether to convert a node into a leaf node, prune its child nodes, or leave the node intact, based on the code length of the node.
申请公布号 US5787274(A) 申请公布日期 1998.07.28
申请号 US19950564694 申请日期 1995.11.29
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 AGRAWAL, RAKESH;MEHTA, MANISH;RISSANEN, JORMA JOHANNES
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址