发明名称 METHOD OF AND SYSTEM FOR COLLECTING NETWORK DATA
摘要 The invention discloses a method of collecting network data. This method is applicable to collection of data of network documents, published on a website, related respectively to M subjects, wherein M is a positive integer, the method including: configuring webpage link addresses, of network data to be collected, into queues of corresponding types according to types corresponding to the webpage link addresses of the network data to be collected, wherein the webpage link addresses of the network data to be collected are link addresses of webpages where the data of the network documents related respectively to the M subjects are located; obtaining webpage source codes corresponding to the webpage link addresses, of the network data to be collected, in the queues of the corresponding types; and extracting the data of the network documents corresponding to URLs corresponding to the webpage source codes according to the URL information and collection depth values of the URLs.
申请公布号 US2014289394(A1) 申请公布日期 2014.09.25
申请号 US201214123036 申请日期 2012.12.13
申请人 PEKING UNIVERSITY FOUNDER GROUP CO., LTD ;BEIJING FOUNDER ELECTRONICS CO., LTD ;PEKING UNIVERSITY 发明人 Wu Xinli;Yang Jianwu
分类号 H04L12/26 主分类号 H04L12/26
代理机构 代理人
主权项 1. A method of collecting network data, applicable to collection of data of network documents, published on a website, related respectively to M subjects, wherein M is a positive integer, the method comprising: configuring webpage link addresses, of network data to be collected, into queues of corresponding types according to types corresponding to the webpage link addresses of the network data to be collected, wherein the webpage link addresses of the network data to be collected are link addresses of webpages where the data of the network documents related respectively to the M subjects are located; obtaining webpage source codes corresponding to the webpage link addresses, of the network data to be collected, in the queues of the corresponding types; and extracting the data of the network documents corresponding to Uniform Resource Locators (URLs) corresponding to the webpage source codes according to URL information and collection depth values of the URLs.
地址 Beijing CN