发明名称 |
RESOLUTION OF DATA INCONSISTENCIES |
摘要 |
Examples disclosed herein enable identifying a feature that is common to a first dataset and a second dataset, wherein a first value of the feature in the first dataset is different from a second value of the feature in the second dataset; determining a first predicted value of the feature in the first dataset based on a second dataset classifier trained on the second dataset; determining a second predicted value of the feature in the second dataset based on a first dataset classifier trained on the first dataset; determining a first similarity score between the first value and the first predicted value; determining a second similarity score between the second value and the second predicted value; and generating a bipartite graph that comprises a first node indicating the first value, a second node indicating the second value, and an edge indicating the first or second similarity score. |
申请公布号 |
US2016147799(A1) |
申请公布日期 |
2016.05.26 |
申请号 |
US201414554418 |
申请日期 |
2014.11.26 |
申请人 |
Hewlett-Packard Development Company, L.P. |
发明人 |
Cohen Ira;Gelberg Mor;Egozi Levi Efrat |
分类号 |
G06F17/30;G06N99/00 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for execution by a computing device for resolving data inconsistencies, the method comprising:
identifying a feature that is common to a first dataset and a second dataset, wherein a first value of the feature in the first dataset is different from a second value of the feature in the second dataset; determining a first predicted value of the feature in the first dataset based on a second dataset classifier trained on the second dataset; determining a second predicted value of the feature in the second dataset based on a first dataset classifier trained on the first dataset; determining a first similarity score between the first value and the first predicted value; determining a second similarity score between the second value and the second predicted value; and generating a bipartite graph that comprises a first node indicating the first value, a second node indicating the second value, and an edge indicating the first or second similarity score. |
地址 |
Houston TX US |