发明名称 Document classification system with user-defined rules
摘要 Machines, systems and methods for classifying documents, the method comprising: classifying a document from among a plurality of documents in a first class, in response to applying statistical analysis to data associated with the document; classifying the document in a second class, in response to determining that a rule from among a plurality of rules applies to the document, wherein a proposed rule is added to the plurality of rules, in response to determining that application of the proposed rule to one or more of the plurality of documents to which the rule is applicable does not diminish accuracy of overall classification for the plurality of documents.
申请公布号 US9275331(B2) 申请公布日期 2016.03.01
申请号 US201313899974 申请日期 2013.05.22
申请人 International Business Machines Corporation 发明人 Dayan Yigal S.;Fuchs Gil;Magdalen Josemina M.;Paikowsky Oren
分类号 G06F7/00;G06F17/30;G06N3/08 主分类号 G06F7/00
代理机构 North Shore Patents, P.C. 代理人 North Shore Patents, P.C. ;Baillie Michele Liu
主权项 1. A system for classifying documents, the system comprising: a processor; and a non-transitory computer readable storage medium having a computer readable program, the computer readable program executable by the processor to: before a new rule is added to a plurality of rules: applying the new rule to a plurality of previously classified documents; determining an application of the new rule results in misclassification of a given number of the plurality of previously classified documents; assigning to the new rule a damping factor based on the given number of misclassifications; adding the new rule to the plurality of rules with the assigned damping factor; classifying a document from among a plurality of documents in a first class, in response to applying statistical analysis to data associated with the document; classifying the document in a second class, in response to applying the new rule to the document according to the damping factor assigned to the new rule; and recalculating damping factors assigned to the plurality of rules, comprising: classifying the plurality of documents by applying the plurality of rules to the plurality of documents according to the damping factors assigned to the plurality of rules;inserting into a status matrix statuses of the classification of the plurality of documents;tallying the statuses in a column of the status matrix corresponding to a given rule; andrecalculating the damping factor for the given rule based on the tallied statuses for the given rule,wherein each cell (Cij) of the status matrix comprises a status of the classification of a given document (i) of the plurality of documents with respect to a given rule (j) of the plurality of rules,wherein the statuses comprise:X=the classification of the given document (i) improved by the given rule (j);Y=the classification of the given document (i) spoiled by given rule (j); andZ=the classification of the given document (i) unaffected by the given rule (j).
地址 Armonk NY US