发明名称 |
A SYSTEM FOR CRAWLING THE WEB AND EXTRACTING DESIGNATED DATA AND THE METHOD THEREFOR I.E. WEBHARVESTER |
摘要 |
The present invention discloses a system for crawling the Web and extracting designated data and the method therefor, i.e. WebHarvester, said system comprises: a computer system; a database configured in the computer system; templates residing in the computer system for mapping information in target page for each web site; fetch means for fetching web pages from said web sites and transferring the fetched pages to said computer system; filter means for scanning the fetched pages to extract necessary information from the fetched pages from said web sites according to corresponding one of said templates, respectively; format and post means for converting the extracted information into a standard format, and storing the formatted information in said database. Said computer system is a server connected to Internet.
|
申请公布号 |
WO0002141(A1) |
申请公布日期 |
2000.01.13 |
申请号 |
WO1998CN00117 |
申请日期 |
1998.07.03 |
申请人 |
BI, FUJUN;BLISS, SHAUN;YAN, HONG |
发明人 |
BI, FUJUN;BLISS, SHAUN;YAN, HONG |
分类号 |
G06F17/30;G06Q30/00;(IPC1-7):G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|