摘要 |
A model refinement system refines initial split rules that define an initial decision tree to generate final split-rules. The model refinement refines the initial split rules by removing clauses that are satisfied by match scores that are less than a threshold match score to generate initial trimmed rules. Using the initial trimmed rules, the model refinement system classifies an initial training set and filters the initial training set to remove negative training pairs that are classified as duplicate pairs resulting in a filtered training set. An intermediate decision tree defined by intermediate split-rules is generated based on the filtered training set. Final split-rules are generated based on the intermediate split rules and input pairs of data records are classified as duplicate pairs based on attribute values of the input pairs and the final split-rules.
|