发明名称 Dynamic outlier bias reduction system and method
摘要 A system and method is described herein for data filtering to reduce functional, and trend line outlier bias. Outliers are removed from the data set through an objective statistical method. Bias is determined based on absolute, relative error, or both. Error values are computed from the data, model coefficients, or trend line calculations. Outlier data records are removed when the error values are greater than or equal to the user-supplied criteria. For optimization methods or other iterative calculations, the removed data are re-applied each iteration to the model computing new results. Using model values for the complete dataset, new error values are computed and the outlier bias reduction procedure is re-applied. Overall error is minimized for model coefficients and outlier removed data in an iterative fashion until user defined error improvement limits are reached. The filtered data may be used for validation, outlier bias reduction and data quality operations.
申请公布号 US9069725(B2) 申请公布日期 2015.06.30
申请号 US201113213780 申请日期 2011.08.19
申请人 HARTFORD STEAM BOILER INSPECTION & INSURANCE COMPANY 发明人 Jones Richard Bradley
分类号 G06F17/18 主分类号 G06F17/18
代理机构 Greenberg Traurig, LLP 代理人 Greenberg Traurig, LLP
主权项 1. A computer-implemented method comprising the step of: reducing outlier bias, wherein reducing outlier bias comprises the steps of: selecting a bias criteria used to determine one or more outliers;providing a complete data set, wherein the complete data set comprises all actual values collected for at least one variable;providing a set of model coefficients associated with a mathematical model;(1) generating, by a processor, a set of predicted values for the complete data set based on applying the mathematical model to the complete data set;(2) generating, by the processor, an error set by comparing the set of predicted values to corresponding actual values of the complete data set;(3) generating, by the processor, a set of error threshold values based on the error set and the bias criteria;(4) generating, by the processor, a removed data set comprising elements of the complete data set with corresponding error set values outside the set of error threshold values;(5) generating, by the processor, a censored data set comprising all elements of the complete data set that are not within the removed data set;(6) generating by the processor, a set of updated model coefficients associated with the mathematical model based on the censored data set; and(7) repeating steps (1)-(6) as an iteration unless a censoring performance termination criteria is satisfied, whereby at the iteration the set of predicted values, the error set, the set of error threshold values, the removed data set, and the censored data set are generated using the set of updated model coefficients.
地址 Hartford CT US