发明名称 METHODS AND SYSTEMS TO INCREMENTALLY COMPUTE SIMILARITY OF DATA SOURCES
摘要 <p>Methods and systems for efficiently determining a similarity between two or more datasets. In one embodiment, the similarity is determined based on comparing a subset of sorted frequency-weighted blocks from one dataset to a subset of sorted frequency-weighed blocks from another dataset. Data blocks of a dataset are converted into hash values that are frequency-weighted. These frequency-weighted hash values can be compared to frequency-weighted hash values of another dataset to determine a similarity of the two datasets. In another embodiment, upon a change of a block in a subset of the dataset, the similarity value is re-determined without resorting or hashing the blocks of a dataset other than the blocks of the subset, resulting in an increased performance of a similarity comparison. In another embodiment, blocks of a dataset are excluded based on a block-filtering rule to increase the accuracy of the similarity comparison.</p>
申请公布号 EP2652649(A1) 申请公布日期 2013.10.23
申请号 EP20110848750 申请日期 2011.12.19
申请人 NETAPP, INC. 发明人 GAONKAR, SHRAVAN;DIXIT, SAGAR
分类号 G06F17/40;G06F12/00;G06F17/30 主分类号 G06F17/40
代理机构 代理人
主权项
地址