发明名称 OPTIMIZING WEB CRAWLING THROUGH WEB PAGE PRUNING
摘要 Crawling computer-based documents by performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, where each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, and one or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity, and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.
申请公布号 US2015278202(A1) 申请公布日期 2015.10.01
申请号 US201414227456 申请日期 2014.03.27
申请人 International Business Machines Corporation 发明人 Sperling Shahar;Tripp Omer;Weisman Omri
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for crawling computer-based documents, the method comprising: performing static analysis on a computer-based document to identify within the computer-based document one or more execution vectors, wherein each execution vector includes a computer program segment including a call to an entity that is external to the computer-based document, andone or more additional computer program segments whose execution precedes and leads ultimately to execution of the computer program segment that includes the call to the entity; and causing any of the computer program segments in any of the execution vectors to be executed during a crawling of the computer-based document, and any computer program segment within the computer-based document that is excluded from the execution vectors to be excluded from execution during the crawling of the computer-based document.
地址 Armonk NY US