摘要 |
Systems and methods for classifying documents into categories based on text associated with the documents are disclosed. Embodiments of the present invention further provide methods for establishing a database of hierarchical classes and a system for classifying text-related content into the hierarchical classes. Text relating to documents is parsed into features with at least one feature having a plurality of terms. Vocabulary is determined from the features based on feature frequency and for each class into which a document is classified, the vocabulary that occurs in the text associated with the document is stored.
|