发明名称 DETECTING DUPLICATE AND NEAR-DUPLICATE FILES
摘要 Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.
申请公布号 US2008162478(A1) 申请公布日期 2008.07.03
申请号 US20080049278 申请日期 2008.03.15
申请人 发明人 PUGH WILLIAM;HENZINGER MONIKA H.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址