摘要 |
PROBLEM TO BE SOLVED: To provide a method for evaluating generated clustering regarding a document page collection. SOLUTION: A method for evaluating generated clustering regarding a document page collection includes steps of: obtaining a document page collection, wherein each document page in the collection has one or more features, and the one or more features defines a paper layout attribute, and extracting information from the one or more features on each document page; constructing a feature vector for the one or more features on each document page; assigning a feature weight for each feature; computing a metric, based on the feature weight and the feature vector; and clustering the document page collection using the metric. COPYRIGHT: (C)2007,JPO&INPIT
|