发明名称 Discrepancy detection for web crawling
摘要 Search engines may utilize web crawlers to discover desirable content that may be provided to users as search results. Unfortunately, document providers, such as websites, may return junk web pages and/or maintenance web pages as document results, which may be undesirable for a search engine to provide as search results. Accordingly, document providers may be grouped into provider clusters. Profiles may be assigned to provider clusters, where a profile may comprise parameters representing "expected" parameters historically returned from normal document fetch operations to document providers within the provider cluster. Parameters of a profile for a provider cluster comprising a document provider may be compared with current document fetch parameters of a current document fetch operation. If the parameters of the profile and the current document fetch parameters do not match, then an alert may be generated.
申请公布号 US8639773(B2) 申请公布日期 2014.01.28
申请号 US20100817797 申请日期 2010.06.17
申请人 SHYAMKUMAR BALAJI B.;SAHNI PUNEET;VERMA HARSH;MICROSOFT CORPORATION 发明人 SHYAMKUMAR BALAJI B.;SAHNI PUNEET;VERMA HARSH
分类号 G06F15/16 主分类号 G06F15/16
代理机构 代理人
主权项
地址