摘要 |
A system for determining that a document B is a candidate for near duplicate to a document A with a given similarity level th. The system includes a storage for providing two different functions on the documents, each function having a numeric function value. The system further includes a processor associated with the storage and configured to determine that the document B is a candidate for near duplicate to the document A, if a condition is met. The condition includes: for any function fi from among the two functions, fi(A)-fi(B)<=deltai(f,A,th).
|