主权项 |
1. A computer-implemented method of record matching in a database, the computer-implemented method being implemented by a computer program that is stored by a memory of a computer system and executed by a processor of the computer system, the computer-implemented method comprising:
generating, by the computer system, a plurality of regular expressions from a plurality of records, wherein each of the plurality of regular expressions corresponds to a corresponding one of the plurality of records; generating, by the computer system, a combined regular expression by combining the plurality of regular expressions, wherein generating the combined regular expression comprises generating the combined regular expression by performing a union operation on the plurality of regular expressions; generating, by the computer system, a combined finite state representation from the combined regular expression; processing, by the computer system, the combined finite state representation to identify that a first record matches a second record in the plurality of records; generating a subset of the plurality of records that does not contain matches by processing the plurality of records using the combined finite state representation; and checking whether a new record is a duplicate before adding it to the plurality of records by processing the new record using the combined finite state representation. |