发明名称 Consistent weighted sampling of multisets and distributions
摘要 Techniques are provided that identify near-duplicate items in large collections of items. A list of (value, frequency) pairs is received, and a sample (value, instance) is returned. The value is chosen from the values of the first list, and the instance is a value less than frequency, in such a way that the probability of selecting the same sample from two lists is equal to the similarity of the two lists.
申请公布号 US2008235201(A1) 申请公布日期 2008.09.25
申请号 US20070726644 申请日期 2007.03.22
申请人 MICROSOFT CORPORATION 发明人 MCSHERRY FRANK D.;TALWAR KUNAL;MANASSE MARK STEVEN
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利