发明名称 CLASSIFICATION SYSTEM WITH METHODOLOGY FOR EFFICIENT VERIFICATION
摘要 Techniques for a classification system with methodology for enhanced verification are described. In one approach, a classification computer trains a classifier based on a set of training documents. After training is complete, the classification computer iterates over a collection unlabeled documents uses the trained classifier to predict a label for each unlabeled document. A verification computer retrieves one of the documents assigned a label by the classification computer. The verification computer then generates a user interface that displays select information from the document and provides an option to verify the label predicted by the classification computer or provide an alternative label. The document and the verified label are then fed back into the set of training documents and are used to retrain the classifier to improve subsequent classifications. In addition, the document is indexed by a query computer based on the verified label and made available for search and display.
申请公布号 US2016078022(A1) 申请公布日期 2016.03.17
申请号 US201414483527 申请日期 2014.09.11
申请人 PALANTIR TECHNOLOGIES INC. 发明人 LISUK DAVID;HOLTZEN STEVEN
分类号 G06F17/28;G06F3/0484;G06N99/00 主分类号 G06F17/28
代理机构 代理人
主权项 1. A method comprising: obtaining a document; determining, using a trained classifier, a candidate label for the document from a plurality of labels; selecting one or more linguistic structures from the document; displaying a user interface that presents data from the document, including at least a portion of the one or more linguistic structures, and the candidate label, wherein the portion of the one or more linguistic structures are displayed by the user interface, wherein the user interface includes one or more user interface controls which present a first option to accept the candidate label for the document and a second option to select a different label for the document, the one or more user interface controls further presenting an element for highlighting the one or more linguistic structures within the document; receiving, via the one or more user interface controls, input representing selection of the first option or the second option, and further input comprising a highlighted section of the one or more linguistic structures that was important to the selection of the first option or the second option; associating the document with a verified label; changing one or more weights assigned to the highlighted section relative to a non-highlighted section during retraining of the trained classifier; wherein the method is performed by one or more computing devices.
地址 Palo Alto CA US