发明名称 System, Method, and service for collaborative focused crawling of documents on a network
摘要 A collaborative focused crawler crawls documents on a network locating documents that match multiple focus topics. The collaborative crawler comprises a fetcher and a focus engine. The fetcher prioritizes which documents to crawl based on a set of rules, obtains documents from the network, and outputs crawled documents to the focus engine. The focus engine determines whether a fetched document is relevant to any of the multiple focus topics. The focus engine determines whether fetched documents are disallowed. If a fetched document is disallowed, the present system may place the URL for that web document in a blacklist, a list of URLs that may not be crawled. URLs may be disallowed if they match a disallowed topic or if they fail a set of rules designed for a web space focus, for example, domain rules, IP address rules, and prefix rules.
申请公布号 US2005086206(A1) 申请公布日期 2005.04.21
申请号 US20030686964 申请日期 2003.10.15
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 BALASUBRAMANIAN SRINIVASAN;CHAVET LAURENT;QI RUNPING
分类号 G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址
您可能感兴趣的专利