发明名称 Method for determining the resemblance of documents
摘要 A method for facilitating the comparison of two computerized documents. The method includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first document into a first sequence of tokens, reducing the second document into a second sequence of tokens, converting the first set of tokens to a first (multi)set of shingles, converting the second set of tokens to a second (multi)set of shingles, determining a first sketch of the first (multi)set of shingles, determining a second sketch of the second (multi)set of shingles, and comparing the first sketch and the second sketch. The sketches have a fixed size, independent of the size of the documents. The resemblance of two documents is provided using a sketch of each document. The sketches may be computed fairly fast and given two sketches the resemblance of the corresponding documents can be computed in linear time in the size of the sketches.
申请公布号 US5909677(A) 申请公布日期 1999.06.01
申请号 US19960665709 申请日期 1996.06.18
申请人 DIGITAL EQUIPMENT CORPORATION 发明人 BRODER, ANDREI ZARY;NELSON, CHARLES GREGORY
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址