摘要 |
Methods and apparatus are provided for identifying constraint violation repairs in data that is comprised of a plurality of records, where each record has a plurality of cells. A database is processed, based on a plurality of constraints that data in the database must satisfy. At least one constraint violation to be resolved is identified based on a cost of repair and the corresponding records to be resolved and equivalent cells are identified in the data that violate the identified at least one constraint violation. A value for each of the equivalent cells can optionally be determined, and the determined value can be assigned to each of the equivalent cells. The at least one constraint violation selected for resolution may be, for example, the constraint violation with a lowest cost. The cost of repairing a constraint is based on a distance metric between the attributes values.
|