主权项 |
1. A model-based content classification system, comprising:
a user interface that receives from a user one or more user-configurable bounds on searching content; a pattern matching engine that receives the one or more user-configurable bounds from the user interface, wherein the pattern matching engine is stored in memory and execution of the pattern matching engine by a processor:
searches a string having known content for a predetermined pattern, the search limited according to the received user-configurable bounds,computes a plurality of scores indicating a likelihood that the content of the string corresponds to one or more predetermined content categories, andupdates the scores using a plurality of weights associated with the predetermined pattern in response to detecting the predetermined pattern; a content classification model generator that receives the updated scores from the pattern matching engine, wherein the model generator is stored in memory and execution of the model generator by a processor:
generates a content classification model based on the updated scores, andtransmits the content classification model to a model repository stored in memory; and a content classification engine stored in memory, wherein execution of the content classification engine by a processor:
accesses the content classification model stored in the model repository, andclassifies subsequent content based on the content classification model. |