发明名称 Data quality assessment
摘要 According to one embodiment of the present invention, a system assesses the quality of column data. The system assigns a pre-defined domain to one or more columns of the data based on a validity condition for the domain, applies the validity condition for the domain assigned to a column to data values in the column to compute a data quality metric for the column, and computes and displays a metric for a group of columns based on the computed data quality metric of at least one column in the group. Embodiments of the present invention further include a method and computer program product for assessing the quality of column data in substantially the same manners described above.
申请公布号 US9558230(B2) 申请公布日期 2017.01.31
申请号 US201313764880 申请日期 2013.02.12
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 Hollifield Thomas;Saillet Yannick
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Edell, Shapiro & Finnan, LLC 代理人 Carroll Terry J.;Edell, Shapiro & Finnan, LLC
主权项 1. A system for assessing the quality of data comprising: at least one processor and a metadata repository, wherein the at least one processor is configured to: apply a set of validity conditions for each of a plurality of pre-defined domains in the metadata repository to each of a plurality of columns of the data, wherein at least one pre-defined domain is associated with a plurality of rules specifying the set of validity conditions for data of that at least one pre-defined domain;assign pre-defined domains selected from among the plurality of pre-defined domains to corresponding columns of the data based on satisfaction of the set of validity conditions for the pre-defined domains, wherein each of at least two of the columns of the data remains unassigned to a corresponding domain;generate a group of two or more of the unassigned columns based on characteristics of the unassigned columns;create a new domain for the group of unassigned columns with a corresponding set of validity conditions and assign the columns of the group to the new domain;apply the set of validity conditions for the domain assigned to a column to data values in the column to compute a data quality metric for the column; anddisplay a metric for one or more sets of columns that is computed based on the computed data quality metric of at least one column in each set of columns.
地址 Armonk NY US