发明名称 Method and system for information retrieval effectiveness estimation in e-discovery
摘要 A server computing system determines a plurality of statistics for a plurality of test documents, determines a number of false negatives for a corpus of documents based on one or more of the plurality of statistics for the plurality of test documents. The classification of a document of the corpus of documents is a false negative if classification of the document by a classification model is negative and classification of the document by a user is positive. The server computing system calculates an effectiveness of an information retrieval system on a corpus of documents based on the number of false negatives for the corpus of documents.
申请公布号 US9613319(B1) 申请公布日期 2017.04.04
申请号 US201514820416 申请日期 2015.08.06
申请人 Veritas Technologies LLC 发明人 Yu Shengke;Rangan Venkat
分类号 G06F17/30;G06N99/00 主分类号 G06F17/30
代理机构 Wilmer Cutler Pickering Hale and Dorr LLP 代理人 Wilmer Cutler Pickering Hale and Dorr LLP
主权项 1. A method for estimating the effectiveness of information retrieval for electronic discovery comprising: calculating, by at least one computer processor configured to operate in an information retrieval system, a plurality of statistics for a plurality of test documents, wherein the plurality of statistics for the plurality of test documents comprises a number of documents that are false negatives in the plurality of test documents; calculating, by the at least one computer processor, the number of false negatives for a corpus of documents based on one or more of a number of test documents in the plurality of test documents, a size of the corpus of documents, a predetermined confidence level, and the number of false negatives in the plurality of test documents, wherein classification of a document of the corpus of documents is a false negative if classification of the document by a classification model is negative and classification of the document by a user is positive; and calculating, by the at least one computer processor, an effectiveness of the information retrieval system on the corpus of documents based on the number of false negatives for the corpus of documents.
地址 Mountain View CA US