发明名称 |
Efficient development of a rule-based system using crowd-sourcing |
摘要 |
Described herein are methods, systems, apparatuses and products for efficient development of a rule-based system. An aspect provides a method including accessing data records; converting said data records to an intermediate form; utilizing intermediate forms to compute similarity scores for said data records; and selecting as an example to be provided for rule making at least one record of said data records having a maximum dissimilarity score indicative of dissimilarity to already considered examples. |
申请公布号 |
US8949204(B2) |
申请公布日期 |
2015.02.03 |
申请号 |
US201213597589 |
申请日期 |
2012.08.29 |
申请人 |
International Business Machines Corporation |
发明人 |
Chaturvedi Snigdha;Faruquie Tanveer Afzal;Subramaniam L. Venkata |
分类号 |
G06F17/00;G06F17/30 |
主分类号 |
G06F17/00 |
代理机构 |
Ference & Associates LLC |
代理人 |
Ference & Associates LLC |
主权项 |
1. A method of data cleansing, said method comprising:
utilizing at least one processor to execute computer code configured to perform the steps of: accessing data records; converting said data records to an intermediate form; utilizing intermediate forms of said data records to compute similarity scores of individual ones of said data records with respect to one another; from among said data records, providing at least one example record for rule making; and thereafter selecting from among said data records at least one additional example record for rule making; the additional example record comprising at least one record presenting at least one similarity score which indicates a least similarity with respect to the at least one example record already provided; the at least one example record and the at least one additional example record comprising a rule set; and employing a difficulty method to select from among said data records at least one training instance for updating the rule set; the selected at least one training instance comprising at least one example record presenting at least one similarity score which indicates a least similarity with respect to at least one example record in the rule set. |
地址 |
Armonk NY US |