发明名称 |
Data quality assessment |
摘要 |
According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above. |
申请公布号 |
US9558230(B2) |
申请公布日期 |
2017.01.31 |
申请号 |
US201313764880 |
申请日期 |
2013.02.12 |
申请人 |
INTERNATIONAL BUSINESS MACHINES CORPORATION |
发明人 |
Hollifield Thomas;Saillet Yannick |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Edell, Shapiro & Finnan, LLC |
代理人 |
Carroll Terry J.;Edell, Shapiro & Finnan, LLC |
主权项 |
1. A system for assessing the quality of data comprising:
at least one processor and a metadata repository, wherein the at least one processor is configured to:
apply a set of validity conditions for each of a plurality of pre-defined domains in the metadata repository to each of a plurality of columns of the data, wherein at least one pre-defined domain is associated with a plurality of rules specifying the set of validity conditions for data of that at least one pre-defined domain;assign pre-defined domains selected from among the plurality of pre-defined domains to corresponding columns of the data based on satisfaction of the set of validity conditions for the pre-defined domains, wherein each of at least two of the columns of the data remains unassigned to a corresponding domain;generate a group of two or more of the unassigned columns based on characteristics of the unassigned columns;create a new domain for the group of unassigned columns with a corresponding set of validity conditions and assign the columns of the group to the new domain;apply the set of validity conditions for the domain assigned to a column to data values in the column to compute a data quality metric for the column; anddisplay a metric for one or more sets of columns that is computed based on the computed data quality metric of at least one column in each set of columns. |
地址 |
Armonk NY US |