摘要 |
A method for prediction analysis using text categorization is provided. The method includes the steps of: grouping a plurality of text documents into a plurality of classes; selecting a top m most discriminatory terms for each class of documents using statistical based measures; determining for each document the presence or absence of each of the discriminatory terms, learning rule-based models of each class of documents using a rule learning algorithm; determining, for at least a portion of the plurality of documents, if a given learned rule has been satisfied by each respective document; creating a database of the rules associated with documents satisfying the rules; and performing distributed data mining to form a predictive result based on at least a portion of the plurality of documents.
|