一种网页爬虫协作方法,申请号CN201110375264.1-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	一种网页爬虫协作方法
摘要	本发明公开了提供一种网页爬虫协作方法，所述的方法步骤如下：首先，爬虫节点依据在线时间段形成若干采集组，所有的各采集组能实现一个周期的连续在线；然后，采集组间通过消息交换的方法采集网页；最后，所有的采集组协作存储被采集的网页。所述每个采集组采用自动生成或配置的方式得到对应于该采集组的ID号。消息交换的方法可采用：各采集组形成一个路由网络，节点依据路由信息表将信令或消息发送到另一个采集组；其中，所述路由网络中的路由协议可采用IP网络路由中的路由协议，或对等网络中的各种DHT协议；或采用中心控制消息交换方法。本发明可解决采集设备中心化面临的带宽问题和网页的海量存储问题，解决P2P采集的时间可持续问题。
申请公布号	CN102480524B	申请公布日期	2014.09.10
申请号	CN201110375264.1	申请日期	2011.11.23
申请人	中国科学院声学研究所	发明人	王劲林;王玲芳;邓峰;齐向东
分类号	H04L29/08(2006.01)I;H04L29/06(2006.01)I;G06F17/30(2006.01)I	主分类号	H04L29/08(2006.01)I
代理机构	北京法思腾知识产权代理有限公司 11318	代理人	杨小蓉;高宇
主权项	一种网页爬虫协作方法，用于多种网络环境下存在大量网页爬虫情况下的协作，所述的方法步骤如下：步骤1，将一设定周期中某段时间同时在线的爬虫节点划分为一采集组，且各爬虫节点被划分的所有采集组的在线时间段相连接能实现一个周期的连续在线；步骤2，以所述采集组为单位进行网页采集，且各采集组间还通过消息交换的方法协作实现设定周期内对网页内容的不间断采集；步骤3，所述每个采集组内的若干爬虫节点协作存储该采集组所采集的网页；其中，所述采集组包含两个以上的采集节点。
地址	100190 北京市海淀区北四环西路21号

您可能感兴趣的专利

Dry-cast hollowcore concrete sandwich panels

Portable ventilating system

Tubular crossmember

Display with appurtenance attachment system

Portable sand blasting cabinet and accessory end caps

Image synthesizing method

Non-circular micro-via

System for using rate of exception event generation during execution of translated instructions to control optimization of the translated instructions

Noise reduction filter array

Method for production of formic acid

Method and means for packaging draperies for purposes of storage or transportation

Apparatuses, methods, computer programming, and propagated signals for modeling motion in computer applications

Method for the inversion of CPMG measurements enhanced by often repeated short wait time measurements

Ranked cleaning policy and error recovery method for file systems using flash memory

Systems and methods for multiport memory access in a multimaster environment

Method for generating analyses of categorical data

Image reading apparatus and image processing apparatus

Capacitor charging device for a flash

Fiber optic with high strength component