发明名称 Classifier tuning based on data similarities
摘要 A probabilistic classifier is used to classify data items in a data stream. The probabilistic classifier is trained, and an initial classification threshold is set, using unique training and evaluation data sets (i.e., data sets that do not contain duplicate data items). Unique data sets are used for training and in setting the initial classification threshold so as to prevent the classifier from being improperly biased as a result of similarity rates in the training and evaluation data sets that do not reflect similarity rates encountered during operation. During operation, information regarding the actual similarity rates of data items in the data stream is obtained and used to adjust the classification threshold such that misclassification costs are minimized given the actual similarity rates.
申请公布号 US7089241(B1) 申请公布日期 2006.08.08
申请号 US20030740821 申请日期 2003.12.22
申请人 AMERICA ONLINE, INC. 发明人 ALSPECTOR JOSHUA;KOLCZ ALEKSANDER;CHOWDHURY ABDUR
分类号 G06F7/00;G06Q10/00;H04L12/58 主分类号 G06F7/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利