摘要 |
<p>The method in this invention is a novel, anomaly detection based imputation technique designed for handling the local corruptions in a given data set. For a given data instance, the method first localizes the corruption through several statistical checks via an appropriate anomaly detection algorithm from the machine learning literature. Then the corrupted attributes, which might be considered as "missing" after the corruption is localized, are imputed using the average statistics extracted from the data set. In the machine learning applications such as data classification and clustering, data imputation is an important technique to improve the performance of the algorithms, i.e., empirical error rates in case of a binary classification, when a fraction of the data is corrupted under severe noise conditions. For instance, this corruption might be due to a scratch on the compact disc where the data are stored, or occlusion of a visual object in computer vision tasks. In this regard, data imputation techniques and our invention aim at replacing the corrupted or missing parts with statistically meaningful substituted values such that the corrupted parts after imputations becomes statistically consistent with the intact parts.</p> |