发明名称 Detecting duplicate and near-duplicate files
摘要 Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.
申请公布号 US6658423(B1) 申请公布日期 2003.12.02
申请号 US20010768947 申请日期 2001.01.24
申请人 GOOGLE, INC. 发明人 PUGH WILLIAM;HENZINGER MONIKA H.
分类号 G06F17/30;(IPC1-7):G06F17/30;G06F7/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址