发明名称 METHOD AND SYSTEM FOR WEB DOCUMENT CLUSTERING
摘要 Method and System for web documents clustering are provided. The method for web documents clustering comprises the steps of: inputting a plurality of web documents; collecting information of the links and the directory structure of the inputted web documents; extracting, according to the collected links and directory structure, a hierarchical structure for the plurality of web documents; and generating and outputting, based on the extracted hierarchical structure, one or more clusters of the plurality of web documents. In some embodiments, the hierarchical relations of the generated clusters can also be outputted at the same time. Compared with the prior art, the method and system for web documents clustering according to the present invention can improve substantially the accuracy and efficiency of the web documents clustering.
申请公布号 US2009070366(A1) 申请公布日期 2009.03.12
申请号 US20080208644 申请日期 2008.09.11
申请人 NEC (CHINA) CO., LTD. 发明人 ZHAO YU;LI JIANQIANG
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址