发明名称 URL AND ANCHOR TEXT ANALYSIS FOR FOCUSED CRAWLING
摘要 Systems and methods of URL and anchor text analysis for focused crawling are disclosed. In an exemplary embodiment, a method may include training a focused crawler by: obtaining a training set of at least URL's or anchor text for a website, computing a score for the training set, and extracting a plurality of features of the training set, and computing a score for each of the plurality of features. The features identify key information contained in the website. The method may also include executing a trained focused crawler on other websites.
申请公布号 US2010293116(A1) 申请公布日期 2010.11.18
申请号 US20100680903 申请日期 2010.03.31
申请人 发明人 FENG SHI CONG;XIONG YUHONG;ZHANG LI
分类号 G06F15/18;G06F17/30 主分类号 G06F15/18
代理机构 代理人
主权项
地址