发明名称 Path-based ranking of unvisited web pages
摘要 Path-based ranking of unvisited Web pages for WWW crawling is provided, via identifying all the paths beginning with a“seed”URL and leading to visited relevant web pages as“good-path set”, and for each unvisited web page, identifying the paths beginning from the“seed”URL leading to it as“partial-path set”; classifying all the visited web pages and labeling each web Page with the labels of a class or classes it belongs to; training a statistic model for generalizing the common patterns among all ones of“good-path set”; and evaluating the“partial-path set”with the statistic model and ranking the unvisited web pages with the evaluation results.
申请公布号 US7979444(B2) 申请公布日期 2011.07.12
申请号 US20080183751 申请日期 2008.07.31
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 MA XIAOCHUAN;PAN YUE;SU HUI
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址