发明名称 Decision tree with compensation for previously unseen data
摘要 A computer-implemented method is disclosed for efficiently processing records with unseen data. In the method, a computer system may obtain a plurality of records and a decision tree generated in a learning process. The decision tree may include a distinction node having multiple paths extending therefrom. After arriving at the distinction node with one or more records, the computer system may determine that the one or more records correspond to data of a type not seen by the distinction node in the learning process. Thereafter, the computer system may depart the distinction node via each of the multiple paths and eventually reach multiple leaf nodes of the decision tree. Each of the multiple leaf nodes may correspond to a probability distribution. Accordingly, the computer system may combine the probability distribution of each of the multiple leaf nodes to obtain a hybrid probability distribution corresponding to the one or more records.
申请公布号 US9355369(B2) 申请公布日期 2016.05.31
申请号 US201313874343 申请日期 2013.04.30
申请人 Wal-Mart Stores, Inc. 发明人 Ray Andrew Benjamin;Troutman Nathaniel Philip
分类号 G06N99/00;G06F15/18 主分类号 G06N99/00
代理机构 Bryan Cave LLP 代理人 Bryan Cave LLP
主权项 1. A computer-implemented method for efficiently processing records with unseen data, the method comprising: obtaining, by a computer system, a plurality of records; obtaining, by the computer system, a decision tree built in a learning process; and processing, by the computer system, the plurality of records through the decision tree, the processing comprising: arriving at a distinction node of the decision tree with one or more records of the plurality of records, the distinction node having multiple paths extending therefrom;determining, by the computer system after the arriving, that the one or more records correspond to data of a type not seen by the distinction node in the learning process;departing, by the computer system after the determining, the distinction node via each of the multiple paths;reaching, by the computer system after the departing, multiple leaf nodes of the decision tree, each of the multiple leaf nodes corresponding to a probability distribution; andcombining the probability distribution of each of the multiple leaf nodes to obtain a hybrid probability distribution corresponding to the one or more records, wherein: the learning process comprises a first number of passes through a first path of the multiple paths and a second number of passes through a second path of the multiple paths;the combining comprises combining a probability distribution corresponding to the first path weighted in proportion to the first number and a probability distribution corresponding to the second path weighted in proportion to the second number; andthe decision tree is a probability estimation tree.
地址 Bentonville AR US