发明名称 |
Date updating in support of data analysis |
摘要 |
A computing device updates date values in a read dataset to support data analytics. Outlier and non-outlier date values are identified by, for each date value as a respective date value, reading a predefined number of neighboring date values relative to the respective date value; computing a median value and a median absolute deviation value of the predefined number of neighboring date values; computing a difference between the respective date value and the median value; dividing an absolute value of the difference by the median absolute deviation value to define a deviation value; comparing the deviation value to a threshold deviation value; and, based on the comparison, identifying the respective date value as an outlier or a non-outlier date value. Each identified non-outlier date value is updated with a new date computed using a date offset value. Each updated, identified non-outlier date value is replaced in a date updated dataset. |
申请公布号 |
US9524315(B1) |
申请公布日期 |
2016.12.20 |
申请号 |
US201615222329 |
申请日期 |
2016.07.28 |
申请人 |
SAS Institute Inc. |
发明人 |
Bonham Robert N.;Holzworth Steven C.;Hayes Keefe |
分类号 |
G06F11/00;G06F17/30 |
主分类号 |
G06F11/00 |
代理机构 |
Bell & Manning, LLC |
代理人 |
Bell & Manning, LLC |
主权项 |
1. A non-transitory computer-readable medium having stored thereon computer-readable instructions that when executed by a computing device cause the computing device to:
read a dataset; identify date values in the read dataset; identify outlier and non-outlier date values included in the identified date values by, for each date value of the identified date values as a respective date value,
reading a predefined number of neighboring date values relative to the respective date value;computing a median value and a median absolute deviation value of the read predefined number of neighboring date values;computing a difference between the respective date value and the computed median value;dividing an absolute value of the computed difference by the computed median absolute deviation value to define a deviation value of the respective date value;comparing the defined deviation value to a threshold deviation value; andbased on the comparison, identifying the respective date value as either an outlier date value or a non-outlier date value; determine a date offset value; update each identified non-outlier date value read from the dataset with a new date computed using the determined date offset value; and store the read dataset to a date updated dataset by replacing each identified non-outlier date value with a corresponding updated date value. |
地址 |
Cary NC US |