摘要 |
The present disclosure discloses a method and system for clustering. The method includes: vectorizing a plurality of readable files to obtain a plurality of file vectors corresponding to the multiple readable files; extracting a total characteristic vector based on the file vectors; and clustering the readable files based on a ranking result of a respective similarity degree between the total characteristic vector and each of the file vectors. The present disclosure also provides a method and system for clustering webpages. An application of the methods or systems described in the present disclosure reduces the number of times of comparison of similarity degrees between file vectors, and further reduces the resulting burden on system resources. This advantageously results in reduced usage of CPU and memory, reduced run time of clustering and improved performance of clustering. |