发明名称 AUTOMATIC RULE COACHING
摘要 A method of validating rules configured to be utilized in an information extraction application, including: receiving a plurality of labeled samples in a training database; for each of the rules in the rule database: (a) determining, for each of the data points of the plurality of labeled samples in the training database to which the rule applies, whether applying the rule to the data point has a positive or negative impact on matching an output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point; (b) generating positive impact information for the rule based on the positive voters; (c) generating negative impact information for the rule based on the negative voters; and (d) determining a metric for the rule based on the quantity of the negative voters and the quantity of the positive voters; ranking the rules based on the metrics corresponding to the rules; and sending to a user for refinement one or more flagged rules of the rules that have a lowest ranking of the metric. Other embodiments are provided.
申请公布号 US2016063386(A1) 申请公布日期 2016.03.03
申请号 US201414475470 申请日期 2014.09.02
申请人 Wal-Mart Stores, Inc. 发明人 Xie Jun;Sun Chong;Yang Fan;Rampalli Narasimhan
分类号 G06N5/04;G06N99/00 主分类号 G06N5/04
代理机构 代理人
主权项 1. A method of validating rules configured to be utilized in an information extraction application, the rules being stored in a rules database, the method being implemented via execution of computer instructions configured to run at one or more processing modules and configured to be stored at one or more non-transitory memory storage modules, the method comprising: receiving a plurality of labeled samples in a training database, each of the plurality of labeled samples comprising a different data point and an assured output, the assured output corresponding to the data point for the information extraction application; for each of the rules in the rule database: determining, for each of the data points of the plurality of labeled samples in the training database to which the rule applies, whether applying the rule to the data point has a positive impact on matching an output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a positive voter, or whether applying the rule to the data point has a negative impact on matching the output for the data point based on the rule to the assured output of the labeled sample corresponding to the data point, such that the data point is a negative voter;generating positive impact information for the rule based on the positive voters, wherein the positive impact information comprises a quantity of the positive voters;generating negative impact information for the rule based on the negative voters, wherein the negative impact information comprises a quantity of the negative voters; anddetermining a metric for the rule based on the quantity of the negative voters and the quantity of the positive voters; ranking the rules based on the metrics corresponding to the rules; sending to a user for refinement one or more flagged rules of the rules that have a lowest ranking of the metric; receiving from the user one or more refined rules; generating a first output for a first data point in an information database based on the rules in the rules database, the rules in the rules database comprising the one or more refined rules, the plurality of labeled samples in the training database being devoid of the first data point; receiving a request for information from a second user; and presenting the first output to the second user in response to the request.
地址 Bentonville AR US