摘要 |
A data classification method and apparatus are disclosed for labeling unknown objects. The disclosed data classification system employs a model selection technique that characterizes domains and identifies the degree of match between the domain meta-features and the learning bias of the algorithm under analysis. An improved concept variation meta-feature or an average weighted distance meta-feature, or both, are used to fully discriminate learning performance, as well as conventional meta-features. The "concept variation" meta-feature measures the amount of concept variation or the degree of lack of structure of a concept. The present invention extends conventional notions of concept variation to allow for numeric and categorical features, and estimates the variation of the whole example population through a training sample. The "average weighted distance" meta-feature of the present invention measures the density of the distribution in the training set. While the concept variation meta-feature is high for a training set comprised of only two examples having different class labels, the average weighted distance can distinguish between examples that are too far apart or too close to one other.
|