主权项 |
1. A computer implemented self-learning system for categorizing input data, said system comprising:
a generator configured to generate an initial training set comprising a plurality of words, wherein each of said words are linked to a corresponding sentiment, said generator still further configured to store each of said words and corresponding sentiment, in the form of database entries; a rule based classifier cooperating with said generator, said rule based classifier configured to receive the input data and extract a plurality of words therefrom, said rule based classifier still further configured to compare each of said plurality of words with the database entries and select amongst the plurality of words, the words being semantically similar to the database entries, said rule based classifier still further configured to assign a first score to only those words that exactly match the database entries, said rule based classifier further configured to aggregate the first score assigned to each of said words and generate an aggregated first score, said rule based classifier still further configured to generate a data classification based on at least the words semantically similar to the database entries; a machine-learning based classifier cooperating with said generator, said machine learning based classifier configured to receive and process the input data, said machine learning based classifier further configured to generate a plurality of features corresponding to the input data based on the processing thereof, and generate a second score corresponding to the input data by processing the features thereof; an ensemble classifier configured to combine the aggregated first score and the second score, and generate a classification score; a comparator having access to a predefined threshold value, said comparator configured to compare said first aggregate score with the predefined threshold value and determine whether the first aggregate score is lesser than the predefined threshold value, said comparator still further configured to determine whether the classification score is lesser than the predefined threshold value, only in the event that the first aggregate score is lesser than the predefined threshold value; and a processor cooperating with the comparator, said processor configured to generate a second training set based on only the data classification generated by the rule based classifier only in the event that the first aggregate score is greater than the predefined threshold value, said processor further configured, to generate the second training set based on only the input data processed by the machine-learning, based classifier, in the event that the classification score is greater than the predefined threshold value |