摘要 |
A system and method for organizing raw data from one or more sources uses an improved mechanism for identifying duplicate data between fields (e.g., columns) in the databases. The fields may be similar fields within a single database or similar or identical fields within a pair of databases and as organized as arrays or field vectors. The present invention sorts each of the field vectors and if necessary, partitions them by common value. A number of comparisons required to identify the duplicate data between the field vectors is reduced by feeding back a difference between the compared values. This difference is used to adjust indices into the field vectors for subsequent comparison.
|