System and method for focused re-crawling of web sites,申请号US20080054482-传众专利搜索

发明名称	System and method for focused re-crawling of web sites
摘要	A method (100) of crawling the Web (620) is disclosed. The method (100) crawls (120) Web pages on the Web starting from a given (110) set of seed Universal Resource Locators (URLs). Crawled Web pages are partitioned (140) into sets of relevant and irrelevant pages. A set of exclusion and/or inclusion patterns are discovered (150) from the sets of relevant and irrelevant pages, and subsequent crawling of the Web is restricted through the set of exclusion and/or inclusion patterns.
申请公布号	US7882099(B2)	申请公布日期	2011.02.01
申请号	US20080054482	申请日期	2008.03.25
申请人	INTERNATIONAL BUSINESS MACHINES CORPORATION	发明人	AGRAWAL NEERAJ;BALAKRISHNAN SREERAM VISWANATH;JOSHI SACHINDRA
分类号	G06F17/30	主分类号	G06F17/30
代理机构		代理人
主权项
地址