发明名称 System and method for automatically classifying text
摘要 A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.
申请公布号 US7028250(B2) 申请公布日期 2006.04.11
申请号 US20010864156 申请日期 2001.05.25
申请人 KANISA, INC. 发明人 UKRAINCZYK IGOR;COPPERMAN MAX;HUFFMAN SCOTT B.
分类号 G06F15/00;G06F17/21;G06F17/27;G06F17/30 主分类号 G06F15/00
代理机构 代理人
主权项
地址