发明名称 |
DATA CLEANING METHODS AND SYSTEMS |
摘要 |
<p>A end-to-end system to annotate unknown type data instances using a knowledge base and crowdsourcing. A computer implemented method for cleaning a database instance using a plurality of holistic patterns, the database instance comprising a plurality of dirty tuples with unknown attribute data types, the method comprising: generating a plurality of candidate holistic patterns using the database instance and a knowledge base, the knowledge base comprising data-types and data-type relationships; determining a valid holistic pattern from the plurality of candidate holistic patterns using at least one of: the knowledge base; and a crowd of users which validate the data- types and the data-type relationships; annotating tuples in the database instance using the valid holistic pattern, wherein the method annotates the tuples with annotations indicating at least one of: knowledge base validated; jointly validated, wherein the crowd of users who at least partially validate the holistic pattern; or erroneous, and repairing the erroneous annotated tuples to generate a clean database instance.</p> |
申请公布号 |
WO2015181511(A1) |
申请公布日期 |
2015.12.03 |
申请号 |
WO2014GB51670 |
申请日期 |
2014.05.30 |
申请人 |
QATAR FOUNDATION;HOARTON, LLOYD |
发明人 |
TANG, NAN;OUZZANI, MOURAD;PAPOTTI, PAOLO;KALDAS, IHAB FRANCIS ILYAS;CHU, XU |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|