主权项 |
1. A method comprising:
generating, for hierarchically arranged plural sets of files, respective collections of values computed based on files in corresponding sets of files, wherein generating the respective collections of values is performed by a system including a processor, and wherein computing a particular collection of values for a particular set of files of the plural sets of files comprises:
computing a first number of values based on the files of the particular set of files, the first number of values computed by applying plural functions on each file of the particular set of files, andselecting a second number of values from the first number of values, where the second number of values is less than the first number of values, and the second number of values are output as the particular collection of values; generating, by the system for a further set of files that is an ancestor of at least a given set of files of the plural sets of files, a respective collection of values that is based on a collection of values computed for at least the given set of files; and identifying, by the system, duplicative sets of files according to comparisons of collections of values including the collections of values for the plural sets of files, and the collection of values for the further set of files. |