发明名称 DYNAMICALLY CONSTRAINED, FORWARD SCHEDULING OVER UNCERTAIN WORKLOADS
摘要 Scheduling searchable items such as web pages for crawling involves dynamically scheduling items for downloading based on capacity based on time. The workload is distributed over time, in advance, by anticipating and accounting for the discovery of new links on the particular host. Respective times to download items can be determined based on the current size of the host's crawl corpus relative to the maximum size of the host's crawl corpus. The respective times may be determined based additionally on respective freshness targets for the searchable items, which characterize how often an item's content should be refreshed by re-downloading the item, and on respective politeness factors for the host, which characterize the delay time between consecutive download requests to that host. As such, one can know precisely how the system is performing at any point in time and predict future performance.
申请公布号 US2009077198(A1) 申请公布日期 2009.03.19
申请号 US20080269879 申请日期 2008.11.12
申请人 LARSSON DANIEL MATTIAS;AHLUWALIA ASHWINDER;KRISHNAN SRIDHARAN GOPAL 发明人 LARSSON DANIEL MATTIAS;AHLUWALIA ASHWINDER;KRISHNAN SRIDHARAN GOPAL
分类号 G06F15/16 主分类号 G06F15/16
代理机构 代理人
主权项
地址