发明名称 System and method for classifying electronically posted documents
摘要 A method for classifying electronically posted documents includes receiving two posted documents and generating corresponding metadata summaries for each, wherein each of the metadata summaries includes at least one sub-tree structure. The structures of the two summary sub-trees within the respective metadata summaries are subsequently compared. If the two summary sub-trees are different, the two documents are deemed distinct. If the two summary sub-trees are the same, attribute values and text content of the metadata summaries are compared over a portion of the metadata summaries. If the compared attribute values and text content are determined to be the same, the documents are deemed duplicative.
申请公布号 US2007022374(A1) 申请公布日期 2007.01.25
申请号 US20060526470 申请日期 2006.09.23
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 HUANG ANITA W.;SUNDARESAN NEELAKANTAN
分类号 G06F17/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址