发明名称 Systems and Methods for Classifying Electronic Information Using Advanced Active Learning Techniques
摘要 Systems and methods for classifying electronic information or documents into a number of classes and subclasses are provided through an active learning algorithm. In certain embodiments, seed sets may be eliminated by merging relevance feedback and machine learning phases. In certain embodiments, the active learning algorithm forks a number of classification paths corresponding to predicted user coding decisions for a selected document. The active learning algorithm determines an order in which the documents of the collection may be processed and scored by the forked classification paths. Such document classification systems are easily scalable for large document collections, require less manpower and can be employed on a single computer, thus requiring fewer resources. Furthermore, the classification systems and methods described can be used for any pattern recognition or classification effort in a wide variety of fields.
申请公布号 US2015324451(A1) 申请公布日期 2015.11.12
申请号 US201514806029 申请日期 2015.07.22
申请人 Cormack Gordon Villy;Grossman Maura Robin 发明人 Cormack Gordon Villy;Grossman Maura Robin
分类号 G06F17/30;G06N99/00 主分类号 G06F17/30
代理机构 代理人
主权项 1. An active learning system for classifying documents in a document collection as a member of one or more classes or subclasses, the system comprising: a processor being adapted to: select a document from the document collection;calculate at least two predicted classifiers, for at least one of the one or more classes or subclasses, each predicted classifier being calculated using a document information profile for the selected document, a current classifier associated with at least one of the one or more classes or subclasses, and a different coding decision selected from a set of possible user coding decisions that may be received from a user;determine a processing order for a subset of documents in the document collection that indicates an order in which the documents of the subset are to be scored;for each one of the predicted classifiers, calculate a set of scores for one or more documents in the document collection, at least in part, according to the processing order, wherein each score is generated for a document by utilizing the corresponding predicted classifier and a document information profile of the document to be scored; andin response to receiving a user coding decision, classify a set of documents in the document collection into one or more of the one or more classes or subclasses using a subset of the set of scores based on the predicted classifier that corresponds to the received user coding decision.
地址 Waterloo CA