摘要 |
A method for linking records (related to an entity) from separate databases may include extracting a first record from a first database as a first vector, extracting a second record from a second database as a second vector, generating first and second sub-vectors for the first and second vectors, where each sub-vector includes quality features from the respective vector, pre-processing the first and second sub- vectors using domain knowledge, calculating a distance assessment classifier based on the first and second sub-vectors, and determining whether the distance represented by the distance assessment classifier is greater than a threshold. If the distance is greater than the threshold, the records may be linked; if not, the method extracts additional records and repeats after generating first and second sub-vectors until the distance is greater than the threshold. A system for linking records is also disclosed. |