发明名称 Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling
摘要 A web crawler system as described herein utilizes a targeted approach to increase the likelihood of downloading web pages of a desired type or category. The system employs a plurality of URL scoring metrics that generate individual scores for outlinked URLs contained in a downloaded web page. For each outlinked URL, the individual scores are combined using an appropriate algorithm or formula to generate an overall score that represents a downloading priority for the outlinked URL. The web crawler application can then download subsequent web pages in an order that is influenced by the downloading priorities.
申请公布号 US7672943(B2) 申请公布日期 2010.03.02
申请号 US20060586779 申请日期 2006.10.26
申请人 MICROSOFT CORPORATION 发明人 WONG SANDY;HUYNH YET L.;NATARAJAN RAMAKRISHNAN;KIM JOON YOUNG;THOGERSEN MICHAEL D.;YAO TONG
分类号 G06F17/30;G06F15/16;G06F17/00 主分类号 G06F17/30
代理机构 代理人
主权项
地址