摘要 |
A method for determining a blocking key includes selecting, randomly, a plurality of record pairs from a pair space that can be formed from a plurality of records of a database, scoring the plurality of record pairs, and comparing a score of each of the plurality of record pairs to a threshold to determine a label for each record pair. The method further includes comparing, character-by-character, each field of each of the plurality of record pairs, wherein a result of the comparison is a binary vector entered in a binary vector matrix, and determining a blocking key based on the binary vector matrix.
|