Method and apparatus for identifying near-duplicate documents,申请号US201113284919-传众专利搜索

发明名称	Method and apparatus for identifying near-duplicate documents
摘要	Duplicate or near-duplicate documents can be identified by creating a vector representing the evaluated document, where vector values are serial numbers of the summary vector coordinates, sorted according to the value in each coordinate. The summary vector is calculated summing bits of hashes of the document shingles. Vectors representing other documents can be reduced in size to 64-bit fingerprints and stored in permanent memory. The duplicates or near-duplicates can be identified by comparing these stored fingerprints with the vector representing evaluated document.
申请公布号	US8370390(B1)	申请公布日期	2013.02.05
申请号	US201113284919	申请日期	2011.10.30
申请人	PERMAKOFF VADIM	发明人	PERMAKOFF VADIM
分类号	G06F7/00;G06F17/30	主分类号	G06F7/00
代理机构		代理人
主权项
地址