发明名称 ENHANCED IDENTIFICATION OF DOCUMENT TYPES
摘要 A method for document management includes automatically extracting respective features from each of a set of documents. The features are processed in a computer so as to generate respective vectors for the documents, each vector including elements having respective values that represent properties of a respective document. A similarity between the documents is assessed by computing a measure of distance between the respective vectors. The documents are automatically clustered responsively to the similarity so as to identify a cluster of the documents belonging to a common document type. Similar methods may be used in supervised categorization, wherein documents are compared and categorized based on a training set that is prepared for each document type.
申请公布号 US2012041955(A1) 申请公布日期 2012.02.16
申请号 US20100853310 申请日期 2010.08.10
申请人 REGEV YIZHAR;WEISS GILAD;NOGACOM LTD. 发明人 REGEV YIZHAR;WEISS GILAD
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址