发明名称 COMPUTER IMPLEMENTED SYSTEM AND METHOD FOR CATEGORIZING DATA
摘要 A self learning system and a method for categorizing input data have been disclosed. The system includes a generator that generates an initial training set comprising a plurality of words linked to scores/ratings which are based on the sentiments conveyed by the words. The words and corresponding ratings and sentiments are inter-linked and stored in a repository. A rule based classifier segregates the input data into individual words, and compares the words with the entries in the repository, and subsequently determines a first score corresponding to the input data. The input data is also provided to a machine-learning based classifier that generates a plurality of features corresponding to the input data and subsequently generates a second score corresponding to the input data. The first score and the second score are further aggregated by an ensemble classifier which further generates a classification score which enables the data to be classified into a plurality of predetermined categories.
申请公布号 US2016189057(A1) 申请公布日期 2016.06.30
申请号 US201514875705 申请日期 2015.10.06
申请人 XURMO TECHNOLOGIES PVT. LTD. 发明人 RAO VINAY GURURAJA;GOPALAKRISHNAN SRIDHAR;SANTHOSH SAURABH;AYAPPA POOVIAH BALLACHANDA
分类号 G06N99/00;G06N5/02;G06F17/30;G06F17/27 主分类号 G06N99/00
代理机构 代理人
主权项 1. A computer implemented self-learning system for categorizing input data, said system comprising: a generator configured to generate an initial training set comprising a plurality of words, wherein each of said words are linked to a corresponding sentiment, said generator still further configured to store each of said words and corresponding sentiment, in the form of database entries; a rule based classifier cooperating with said generator, said rule based classifier configured to receive the input data and extract a plurality of words therefrom, said rule based classifier still further configured to compare each of said plurality of words with the database entries and select amongst the plurality of words, the words being semantically similar to the database entries, said rule based classifier still further configured to assign a first score to only those words that exactly match the database entries, said rule based classifier further configured to aggregate the first score assigned to each of said words and generate an aggregated first score, said rule based classifier still further configured to generate a data classification based on at least the words semantically similar to the database entries; a machine-learning based classifier cooperating with said generator, said machine learning based classifier configured to receive and process the input data, said machine learning based classifier further configured to generate a plurality of features corresponding to the input data based on the processing thereof, and generate a second score corresponding to the input data by processing the features thereof; an ensemble classifier configured to combine the aggregated first score and the second score, and generate a classification score; a comparator having access to a predefined threshold value, said comparator configured to compare said first aggregate score with the predefined threshold value and determine whether the first aggregate score is lesser than the predefined threshold value, said comparator still further configured to determine whether the classification score is lesser than the predefined threshold value, only in the event that the first aggregate score is lesser than the predefined threshold value; and a processor cooperating with the comparator, said processor configured to generate a second training set based on only the data classification generated by the rule based classifier only in the event that the first aggregate score is greater than the predefined threshold value, said processor further configured, to generate the second training set based on only the input data processed by the machine-learning, based classifier, in the event that the classification score is greater than the predefined threshold value
地址 BENGALURU IN