摘要 |
Systems and methods of URL and anchor text analysis for focused crawling are disclosed. In an exemplary embodiment, a method may include training a focused crawler by: obtaining a training set of at least URL's or anchor text for a website, computing a score for the training set, and extracting a plurality of features of the training set, and computing a score for each of the plurality of features. The features identify key information contained in the website. The method may also include executing a trained focused crawler on other websites.
|