发明名称 AUTOMATED DYNAMIC DATA QUALITY ASSESSMENT
摘要 In general, embodiments of the present invention provide systems, methods and computer readable media for automated dynamic data quality assessment. One aspect of the subject matter described in this specification includes the actions of receiving a data quality job including a new data sample; and, if the new data sample is determined to be added to a reservoir of data samples, sending a quality verification request to an oracle; receiving a new data sample quality estimate from the oracle; and adding the new data sample and estimate to the reservoir. A second aspect of the subject matter includes the actions of receiving, from a predictive model, a judgment associated with a new data sample; analyzing the new data sample based in part on the judgment to determine whether to send a new data sample quality verification request to an oracle; and, if a new data sample quality estimate is received from the oracle, determining whether to add the new data sample and the judgment to the reservoir.
申请公布号 US2017024427(A1) 申请公布日期 2017.01.26
申请号 US201615161495 申请日期 2016.05.23
申请人 Groupon, Inc. 发明人 Daly Mark Thomas;Jeffery Shawn Ryan;DeLand Matthew;Pendar Nick;James Andrew;Johnston David
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method, comprising: receiving a data quality job, the data quality job including configuration data and a new data sample having a particular data type, wherein the configuration data comprises an oracle identifier, the oracle identifier indicating a particular oracle to provide a verified quality measure for the new data sample, the particular oracle associated with an attribute of the new data sample; determining, by a processor, whether to add the new data sample to a reservoir of data samples, the reservoir of data samples identified based at least in part on the particular data type, the determining based at least in part on whether the new data sample statistically belongs in the reservoir of data samples; and in an instance in which the new data sample is to be added to the reservoir of data samples, sending, to the particular oracle selected based on the oracle identifier, a quality verification request including the new data sample;receiving a data quality estimate associated with the new data sample from the oracle in response to the quality verification request, wherein the data quality estimate comprises a quality score calculated based on one or more of a percentage of correctness of the data sample and a percentage of completeness of the data sample; andadding the new data sample and the associated data quality estimate to the reservoir of data samples in response to receiving the data quality estimate.
地址 Chicago IL US