摘要 |
A method, system and computer-readable medium are presented for performing multiple-category classification of digital documents using non-binary classification approach that is less computationally intensive and does not require the generation of extra parameters in execution. The method comprises calculating a category score for categories to which a digital document may be classified. The category score is based on the relevance of the text in document. Threshold scores for each of the categories are determined to define a number of candidate relevance types. A candidate relevance type is determined for each the categories based upon the category scores. One or more of the categories are assigned to the document by applying a multiple-category selection rule to each of the categories. The candidate relevance type is used to determine whether the categories assigned to the digital document need further validation. If one or more of the assigned categories needs further validation, the validation is performed.
|