摘要 |
The apparatus computes classification scores based on parameters that have been determined from documents. Each score is compared with a first and second threshold. Definite classifications are assigned when the score is above the highest threshold or below the lowest threshold and the documents are processed accordingly. If the score is between the thresholds the document is singled out for further inspection, for example by a human arbitrator, to assign a class. The first and second threshold are adapted automatically based on specified a minimum accuracy level for the classification and a training set. The apparatus uses this specified accuracy in a search for a combination of threshold values that optimizes classifier yield, in terms of a maximized fraction of patterns in a training set that need not be turned over for further inspection without definite classification. The search is subject to the condition that the combination of thresholds results in at least the specified accuracy over the training set.
|