发明名称 REDUCING USE OF RANDOMNESS IN CONSISTENT UNIFORM HASHING
摘要 Documents that are near-duplicates may be determined using techniques involving consistent uniform hashing. A biased bit may be placed in the leading position of a sequence of bits that may be generated and subsequently used in comparison techniques to determine near-duplicate documents. Unbiased bits may be used in subsequent positions of the sequence of bits, after the biased bit, for use in comparison techniques. Samples may be used collectively, as opposed to individually, in the generation of biased bits. Sequences of bits may thus be produced not on a single sample basis, but for multiple samples, thereby amortizing the cost of generating randomness for the samples. Less than one bit of randomness per sample may be used.
申请公布号 US2010070511(A1) 申请公布日期 2010.03.18
申请号 US20080211814 申请日期 2008.09.17
申请人 发明人 MANASSE MARK STEVEN;MCSHERRY FRANK D.;TALWAR KUNAL
分类号 G06F7/20;G06F7/58;G06F17/30 主分类号 G06F7/20
代理机构 代理人
主权项
地址
您可能感兴趣的专利