主权项 |
1. A computer-implemented method for efficiently processing records with unseen data, the method comprising:
obtaining, by a computer system, a plurality of records; obtaining, by the computer system, a decision tree built in a learning process; and processing, by the computer system, the plurality of records through the decision tree, the processing comprising:
arriving at a distinction node of the decision tree with one or more records of the plurality of records, the distinction node having multiple paths extending therefrom;determining, by the computer system after the arriving, that the one or more records correspond to data of a type not seen by the distinction node in the learning process;departing, by the computer system after the determining, the distinction node via each of the multiple paths;reaching, by the computer system after the departing, multiple leaf nodes of the decision tree, each of the multiple leaf nodes corresponding to a probability distribution; andcombining the probability distribution of each of the multiple leaf nodes to obtain a hybrid probability distribution corresponding to the one or more records, wherein:
the learning process comprises a first number of passes through a first path of the multiple paths and a second number of passes through a second path of the multiple paths;the combining comprises combining a probability distribution corresponding to the first path weighted in proportion to the first number and a probability distribution corresponding to the second path weighted in proportion to the second number; andthe decision tree is a probability estimation tree. |