发明名称 Detection of boilerplate content
摘要 Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating query recommendations. One method provides selecting one or more nodes of an object model that models a resource as a hierarchy of nodes, and determining that the selected nodes exhibit one or more predefined traits that are characteristic of boilerplate content, wherein boilerplate content comprises content that is repeated in multiple resources of a particular web site or content which is not relevant to the main content of a resource. A score associated with the selected nodes is adjusted responsive to determining that the selected nodes exhibit the predefined traits, and information is provided to a query recommendation engine, the information including textual content associated with the selected nodes, and the information identifying the adjusted score associated with the selected nodes.
申请公布号 US8898296(B2) 申请公布日期 2014.11.25
申请号 US201213564034 申请日期 2012.08.01
申请人 Google Inc. 发明人 Zeng Jian;Li Youlin;Murphy Brian R.;Shen Yuzhu
分类号 G06F15/173;G06F17/30 主分类号 G06F15/173
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A system comprising: a client device; and a computer-readable medium coupled to the client device having instructions stored thereon which, when executed by the client device, cause the client device to perform operations comprising: receiving a resource from a server;selecting one or more nodes of a Document Object Model (DOM) tree for the resource;determining that the selected nodes exhibit one or more predefined traits that are characteristic of boilerplate content, wherein boilerplate content comprises content that is repeated in multiple resources of a particular web site;adjusting a boilerplate content score associated with the selected nodes responsive to determining that the selected nodes exhibit the predefined traits that are characteristic of boilerplate content; andproviding information to a query recommendation engine, the information including textual content associated with the selected nodes, and the information identifying the adjusted boilerplate content score associated with the selected nodes.
地址 Mountain View CA US