发明名称 |
Detecting duplicate and near-duplicate files |
摘要 |
Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.
|
申请公布号 |
US6658423(B1) |
申请公布日期 |
2003.12.02 |
申请号 |
US20010768947 |
申请日期 |
2001.01.24 |
申请人 |
GOOGLE, INC. |
发明人 |
PUGH WILLIAM;HENZINGER MONIKA H. |
分类号 |
G06F17/30;(IPC1-7):G06F17/30;G06F7/00 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|