发明名称 A SYSTEM FOR CRAWLING THE WEB AND EXTRACTING DESIGNATED DATA AND THE METHOD THEREFOR I.E. WEBHARVESTER
摘要 The present invention discloses a system for crawling the Web and extracting designated data and the method therefor, i.e. WebHarvester, said system comprises: a computer system; a database configured in the computer system; templates residing in the computer system for mapping information in target page for each web site; fetch means for fetching web pages from said web sites and transferring the fetched pages to said computer system; filter means for scanning the fetched pages to extract necessary information from the fetched pages from said web sites according to corresponding one of said templates, respectively; format and post means for converting the extracted information into a standard format, and storing the formatted information in said database. Said computer system is a server connected to Internet.
申请公布号 WO0002141(A1) 申请公布日期 2000.01.13
申请号 WO1998CN00117 申请日期 1998.07.03
申请人 BI, FUJUN;BLISS, SHAUN;YAN, HONG 发明人 BI, FUJUN;BLISS, SHAUN;YAN, HONG
分类号 G06F17/30;G06Q30/00;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址