摘要 |
Provided is a method for the automated selection of sample documents or pages from a large collection, and more particularly an application of the method in a proof presentment environment-where the method is employed for selection and review of representative or extreme pages from a large document, such as one scheduled for printing. The method characterizes pages or documents in a multi-dimensional vector space based upon a set of characteristics, and then uses clustering techniques to group the pages, enabling the selection of typical pages from the groups, and/or outlier pages from extremes lying outside of the groups.
|