摘要 |
A system generates rules for classifying documents are generated by building a vocabulary of features (e.g., words, phrases, acronyms, etc.) that are related to classifying concepts. The system includes a security document reader receives a security document that defines security concepts for a particular project and parses the security document to separate the security concepts. A vocabulary builder receives samples provided by the user that contain information related to the project. For each security concept, the vocabulary builder uses statistical analysis techniques to find features in the samples that are related to that concept. A rule generation assistant, for each security concept, generates rules based on the built vocabulary and the samples. The rule generation assistant uses statistical analysis techniques on the vocabulary and samples to determine features that optimally predict a particular concept. The rules can be used by a downgrader to process information to be distributed.
|