发明名称 Detecting duplicative hierarchical sets of files
摘要 To detect duplicative hierarchically arranged sets of files in a storage system, a method includes generating, for hierarchically arranged plural sets of files, respective collections of values computed based on files in corresponding sets of files. For a further set of files that is an ancestor of at least one of the plural sets of files, a respective collection of values that is based on the collection of values computed for the at least one set is generated. Duplicative sets according to comparisons of the collections of values are identified.
申请公布号 US9063947(B2) 申请公布日期 2015.06.23
申请号 US200812258316 申请日期 2008.10.24
申请人 Hewlett-Packard Development Company, L.P. 发明人 Forman George H.;Eshghi Kave
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Trop, Pruner & Hu, P.C. 代理人 Trop, Pruner & Hu, P.C.
主权项 1. A method comprising: generating, for hierarchically arranged plural sets of files, respective collections of values computed based on files in corresponding sets of files, wherein generating the respective collections of values is performed by a system including a processor, and wherein computing a particular collection of values for a particular set of files of the plural sets of files comprises: computing a first number of values based on the files of the particular set of files, the first number of values computed by applying plural functions on each file of the particular set of files, andselecting a second number of values from the first number of values, where the second number of values is less than the first number of values, and the second number of values are output as the particular collection of values; generating, by the system for a further set of files that is an ancestor of at least a given set of files of the plural sets of files, a respective collection of values that is based on a collection of values computed for at least the given set of files; and identifying, by the system, duplicative sets of files according to comparisons of collections of values including the collections of values for the plural sets of files, and the collection of values for the further set of files.
地址 Houston TX US