发明名称 METHOD FOR DOCUMENT CLUSTERING BASED ON PAGE LAYOUT ATTRIBUTES
摘要 PROBLEM TO BE SOLVED: To provide a method for evaluating generated clustering regarding a document page collection. SOLUTION: A method for evaluating generated clustering regarding a document page collection includes steps of: obtaining a document page collection, wherein each document page in the collection has one or more features, and the one or more features defines a paper layout attribute, and extracting information from the one or more features on each document page; constructing a feature vector for the one or more features on each document page; assigning a feature weight for each feature; computing a metric, based on the feature weight and the feature vector; and clustering the document page collection using the metric. COPYRIGHT: (C)2007,JPO&INPIT
申请公布号 JP2007080263(A) 申请公布日期 2007.03.29
申请号 JP20060242650 申请日期 2006.09.07
申请人 XEROX CORP 发明人 BERGHOLZ ANDRE
分类号 G06F17/30;G06K9/20 主分类号 G06F17/30
代理机构 代理人
主权项
地址