发明名称 Adaptive web crawling using a statistical model
摘要 A computer based system and method of retrieving information pertaining to documents on a computer network is disclosed. The method includes selecting a set of documents to be accessed during a Web crawl by utilizing a statistical model to determine which previously retrieved documents are most likely to have changed since last accessed. The statistical model is continuously improving its accuracy by training internal probability distributions to reflect the actual experience with change rate patterns of the documents accessed. The decision made whether to access the document is based on the probability of change compared against a desired synchronization level, random selections, maximum limits on the amount of time since the document was last accessed, and other criterion. Once the decision to access is made, the document is checked for changes and this information is used to train the statistical model.
申请公布号 US7328401(B2) 申请公布日期 2008.02.05
申请号 US20040022054 申请日期 2004.12.22
申请人 MICROSOFT CORPORATION 发明人 OBATA KENJI C;MEYERZON DMITRIY
分类号 G06F7/00;G06F15/16;G06F17/00;G06F17/30 主分类号 G06F7/00
代理机构 代理人
主权项
地址