摘要 |
A method and system to produce and train composite similarity functions for record linkage problems, including product normalization problems, is disclosed. In one embodiment, for a group of products in a plurality of products, a composite similarity function is constructed for the group of products from a weighted set of basis similarity functions. Training records are used to calculate the weights in the weighted set of basis similarity functions in the composite similarity function for the group of products. In another embodiment, a composite similarity function is applied to pairs of training records. The application of the composite similarity function provides a number that can be used to indicate whether two records relate to a common subject. The composite similarity function includes a weighted set of basis similarity functions. A perceptron algorithm is used to modify the weights in the weighted set.
|