发明名称 Text categorizers based on regularizing adaptations of the problem of computing linear separators
摘要 A method to automatically categorize messages or documents containing text. The method of solution fits in the general framework of supervised learning, in which a rule or rules for categorizing data is automatically constructed by a computer on the basis of training data that has been labeled beforehand. More specifically, the method involves the construction of a linear separator: training data is used to construct for each category a weight vector w and a threshold t, and the decision of whether a hitherto unseen document d is in the category will depend on the outcome of the test wTx>=t, where x is a vector derived from the document d. The method also uses a set L of features selected from the training data in order to construct the numerical vector representation x of a document. The preferred method uses an algorithm based on Gauss-Seidel iteration to determine the weight factor w that is determined by a regularized convex optimization problem derived from the principle of minimizing modified training error.
申请公布号 US6571225(B1) 申请公布日期 2003.05.27
申请号 US20000502578 申请日期 2000.02.11
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 OLES FRANK J.;ZHANG TONG
分类号 G06F15/18;(IPC1-7):G06F15/18 主分类号 G06F15/18
代理机构 代理人
主权项
地址