SYSTEM AND METHOD FOR NEAR AND EXACT DE-DUPLICATION OF DOCUMENTS,申请号US201113075792-传众专利搜索

发明名称	SYSTEM AND METHOD FOR NEAR AND EXACT DE-DUPLICATION OF DOCUMENTS
摘要	A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
申请公布号	US2011191354(A1)	申请公布日期	2011.08.04
申请号	US201113075792	申请日期	2011.03.30
申请人	MSC INTELLECTUAL PROPERTIES B.V.	发明人	SCHOLTES JOHANNES C.;BLOEMBERGEN SIEBE
分类号	G06F7/00;G06F17/30	主分类号	G06F7/00
代理机构		代理人
主权项
地址