发明名称 Holistic database record repair
摘要 A computer implemented method for repairing records of a database, comprises determining a first set of records of the database which violate a functional dependency of the database, determining a second set of records of the database comprising duplicate records, computing a cost metric representing a measure for the cost of mutually dependently modifying records in the first and second sets, modifying records in the first and second sets on the basis of the cost metric to provide a modified database instance.
申请公布号 US9116934(B2) 申请公布日期 2015.08.25
申请号 US201113218698 申请日期 2011.08.26
申请人 QATAR FOUNDATION 发明人 Kaldas Ihab Francis Ilyas;Yakout Mohamed;Elmagarmid Ahmed K.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Mossman Kumar & Tyler PC 代理人 Mossman Kumar & Tyler PC
主权项 1. A computer implemented method for repairing records of a database, comprising: determining a first set of records of the database that violate a functional dependency from a set of functional dependencies; determining a second set of records of the database using a duplication mechanism wherein the second set of records are duplicate records; appending a duplicate identifier to each record in the second set of records, wherein the duplicate identifier is identical for each record in the second set of records; updating the set of functional dependencies to union a functional dependency based on the duplicate identifier; determining a set of equivalence classes for records of the first set of records and the second set of records consisting of multiple record-attribute pairs; computing a cost metric representing a measure for the cost of modifying records in the first and second sets; merging a pair of equivalence classes of the first set of records and the second set of records into a new class to resolve a functional dependency violation and to perform a duplication of duplicate records; computing a merge cost metric of the merged pair of equivalence classes using the cost metric of each respective class; and modifying records in the first and second sets on the basis of the merge cost metric to provide a modified database instance.
地址 Doha QA