发明名称 System for estimating a distribution of message content categories in source data
摘要 A method of computerized content analysis that gives "approximately unbiased and statistically consistent estimates" of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.
申请公布号 US2009030862(A1) 申请公布日期 2009.01.29
申请号 US20080077534 申请日期 2008.03.19
申请人 KING GARY;HOPKINS DANIEL;LU YING 发明人 KING GARY;HOPKINS DANIEL;LU YING
分类号 G06F17/00;G06N5/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址