发明名称 |
SYSTEM FOR WEB CRAWLING AND METHOD THEREOF |
摘要 |
PURPOSE: A system for web crawling and a method thereof are provided to remarkably reduce the time taken for web crawling by simultaneously downloading external link pages linked to the webpage which has the highest access probability. CONSTITUTION: A seed page priority assigner(11) sets up standard seed pages for web crawling, produces access probability of the seed pages detected through the web crawling and gives priority to the seed page. A downloader(12) downloads the seed page having the highest priority and outlink pages linked to the seed page collectively. An outlink page priority assigner(13) produces access possibility of the seed page and gives the priority to an external link page.
|
申请公布号 |
KR20100094263(A) |
申请公布日期 |
2010.08.26 |
申请号 |
KR20090013597 |
申请日期 |
2009.02.18 |
申请人 |
KOREA UNIVERSITY RESEARCH AND BUSINESS FOUNDATION |
发明人 |
LEE, SANG KEUN;HIJBUL MD. ALAM;HA, JONG WOO |
分类号 |
G06F17/30;G06F17/00;G06F17/21 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|