主权项 |
1. A computer implemented system for validating a document classification process for eDiscovery, internal investigations, law enforcement activities, compliance audits, records management, legacy data clean-up, or defensible dispositions, the system comprising:
a document collection of N documents related to eDiscovery, internal investigations, law enforcement activities, compliance audits, records management, legacy data clean-up, or defensible dispositions; a document classification process performed on the document collection; a random selection module configured to automatically generate a random validation set S of documents based on a user selectable percentage P of the N documents from the document collection; and a manual document review process performed on the random validation set of documents to validate overall results of all of the documents classified by the document classification process, wherein the system is configured to dynamically and in real-time measure and display on a computer display device a best case estimate of a quality of the results of the overall document classification process based on the documents that are validated, given the size N of a total data set of the document collection, and based on a predetermined quality threshold for an overall classification quality desired for the document classification process, and wherein the system is configured to employ automatic document classification methods including at least one of Technology Assisted Review (TAR), Predictive Coding, Machine Assisted Review (MAR), or Computer Assisted Review (CAR), support vector machines (SVM), naive-Bayes classifiers, k-nearest neighbors, rules-based classification, Linear discriminant analysis (LDA), Maximum Entropy Markov Model (MEMM), scatter-gather clustering, and hierarchical agglomerate clustering (HAC). |