发明名称 DATA CLEANING METHODS AND SYSTEMS
摘要 <p>A end-to-end system to annotate unknown type data instances using a knowledge base and crowdsourcing. A computer implemented method for cleaning a database instance using a plurality of holistic patterns, the database instance comprising a plurality of dirty tuples with unknown attribute data types, the method comprising: generating a plurality of candidate holistic patterns using the database instance and a knowledge base, the knowledge base comprising data-types and data-type relationships; determining a valid holistic pattern from the plurality of candidate holistic patterns using at least one of: the knowledge base; and a crowd of users which validate the data- types and the data-type relationships; annotating tuples in the database instance using the valid holistic pattern, wherein the method annotates the tuples with annotations indicating at least one of: knowledge base validated; jointly validated, wherein the crowd of users who at least partially validate the holistic pattern; or erroneous, and repairing the erroneous annotated tuples to generate a clean database instance.</p>
申请公布号 WO2015181511(A1) 申请公布日期 2015.12.03
申请号 WO2014GB51670 申请日期 2014.05.30
申请人 QATAR FOUNDATION;HOARTON, LLOYD 发明人 TANG, NAN;OUZZANI, MOURAD;PAPOTTI, PAOLO;KALDAS, IHAB FRANCIS ILYAS;CHU, XU
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址