发明名称 Count estimation via machine learning
摘要 One or more machine learning classifiers are trained to classify cases in one or more categories using one or more sets of labeled training data. A first distribution of scores for positive cases in the training set is determined for each category, and a second distribution of scores for negative cases in the training set is determined for each category. A third distribution of scores is generated by each classifier classifying cases in a set of target data is also determined. A proportion of cases in the target set that are positive cases for a category is estimated by fitting the first distribution and the second distribution for the category to the third distribution.
申请公布号 US8744987(B1) 申请公布日期 2014.06.03
申请号 US20060406689 申请日期 2006.04.19
申请人 Hewlett-Packard Development Company, L.P. 发明人 Forman George Henry;Suermondt Henri Jacques;Kirshenbaum Evan Randy
分类号 G06F15/18 主分类号 G06F15/18
代理机构 代理人
主权项 1. At least one computer program provided on at least one non-transitory computer readable storage medium and comprising code that when executed causes at least one computer to perform a method comprising: training a machine learning classifier with a set of labeled training data, such that the trained classifier is operable to determine a score with respect to the classification of cases for a category; determining at least one first distribution of scores for positive cases in the training set, wherein a positive case is a case that belongs to the category; determining at least one second distribution of scores for negative cases in the training set, wherein a negative case is a case that does not belong to the category; determining a third distribution of scores generated by the classifier classifying cases in a set of target data; and estimating a proportion of cases in the target set that are positive cases by fitting the at least one first distribution and the at least one second distribution to the third distribution.
地址 Houston TX US