摘要 |
The present invention provides a data selection apparatus which augments a set of training examples with the desired output data. The resulting augmented data set is normalized such that the augmented data values range between -1 and +1 and such that the mean of the augmented data set is zero. The data selection apparatus then groups the augmented and normalized data set into related clusters using a clusterizer. Preferably, the clusterizer is a neural network such as a Kohonen self-organizing map (SOM). The data selection apparatus further applies an extractor to cull a working set of data from the clusterized data set. The present invention thus picks, or filters, a set of data which is more nearly uniformly distributed across the portion of the input space of interest to minimize the maximum absolute error over the entire input space. The output of the data selection apparatus is provided to train the analyzer with important sub-sets of the training data rather than with all available training data. A smaller training data set significantly reduces the complexity of the model building or analyzer construction process.
|