摘要 |
Method and System for web documents clustering are provided. The method for web documents clustering comprises the steps of: inputting a plurality of web documents; collecting information of the links and the directory structure of the inputted web documents; extracting, according to the collected links and directory structure, a hierarchical structure for the plurality of web documents; and generating and outputting, based on the extracted hierarchical structure, one or more clusters of the plurality of web documents. In some embodiments, the hierarchical relations of the generated clusters can also be outputted at the same time. Compared with the prior art, the method and system for web documents clustering according to the present invention can improve substantially the accuracy and efficiency of the web documents clustering.
|