发明名称 Estimating document similarity using bit-strings
摘要 Each of a plurality of documents is divided into samples. Small bit-strings are generated for selected samples from each of the documents and used to create a sketch for each document. Because the bit-strings are small (e.g., only one, two, or three bits in length), the generated sketches are smaller than the sketches generated using previous methods for generating sketches, and therefore use less storage space. The generated sketches are compared to determine documents that are near-duplicates of one another.
申请公布号 US8594239(B2) 申请公布日期 2013.11.26
申请号 US201113031265 申请日期 2011.02.21
申请人 MANASSE MARK S.;KOENIG ARND CHRISTIAN;MICROSOFT CORPORATION 发明人 MANASSE MARK S.;KOENIG ARND CHRISTIAN
分类号 H04L27/00 主分类号 H04L27/00
代理机构 代理人
主权项
地址