发明名称 System and methods for scalably identifying and characterizing structural differences between document object models
摘要 A security auditing computer system efficiently evaluates and reports security exposures in a target Web site hosted on a remote Web server system. The auditing system includes a crawler subsystem that constructs a first list of Web page identifiers representing the target Web site. An auditing subsystem selectively retrieves and audits Web pages based on a second list, based on the first. Retrieval is sub-selected dependent on a determined uniqueness of Web page identifiers relative to the second list. Auditing is further sub-selected dependent on a determined uniqueness of structural identifiers computed for each retrieved Web page, including structural identifiers of Web page components contained within a Web page. The computed structural identifiers are stored in correspondence with Web page identifiers and Web page component identifiers in the second list. A reporting system produces reports of security exposures identified through the auditing of Web pages and Web page components.
申请公布号 US9305169(B2) 申请公布日期 2016.04.05
申请号 US201314105038 申请日期 2013.12.12
申请人 Tinfoil Security, Inc. 发明人 Borohovski Michael;Braun Ainsley K.;Irizarry Angel;Sedat Benjamin D.
分类号 G06F21/57 主分类号 G06F21/57
代理机构 代理人
主权项 1. A security auditing computer system operative to analyze and identify security exposures reflected in Web pages provided from a target Web site, said security auditing computer system comprising: a) an analysis computer subsystem coupleable to a network for communicating with a Web server system hosting a target Web site, said analysis computer subsystem including a data store, said analysis computer subsystem being operative to: i) selectively retrieve a first Web page from said target Web site;ii) construct a document object model representation of said first Web page, wherein said document object model includes a plurality of nodes related in a tree-shaped data structure;iii) compute, for a selected set of said plurality of nodes, structural reference identifiers having a defined uniqueness relative to the corresponding ones of said selected set, wherein the nodes of said plurality of nodes include structural, attributed, and content data, and wherein computation of said structural reference identifiers is based on structural and attributed data, whereby differences in content data is not considered in determining comparison matches, and wherein computation of said structural reference identifiers is performed by the execution of a hash function defined by H(S|A(∥H′)1n)wherein H is a hash function, S is data representing structural information, A is data representing attributes, | is a designated separator value, ∥ is a concatenation function, and H′ is the value returned by the hash function for a sub node, relative to a current node, over a range of 1 through n, wherein n represents the total number of child nodes that depend on said sub node;iv) compare said structural reference identifiers with a collection of prior computed structural reference identifiers stored in said data store, wherein a comparison match between a first structural reference identifier computed with respect to a first portion of said first Web page and a second structural reference identifier prior computed with respect to a second portion of a second Web page is determined by the scope of said defined uniqueness;v) record, in said data store, a correspondence of audit identified security exposures between said first portion of said first Web page and said second portion of said second Web page;vi) skip further audit analysis of said first portion of said first Web page;vii) identify, with respect to said first Web page, a plurality of first Web components, each of said first Web components having a corresponding root element, wherein a corresponding one of said structural reference identifiers is associated with each said root element;viii) select, from said plurality of first Web components, a set of said first Web components not matched by comparison of said root element corresponding structural reference identifiers with structural reference identifiers stored in said data store; andix) audit said set of said Web components not matched for predetermined security exposures, wherein audit identified security exposures are recorded in said data store with respect to corresponding said structural reference identifiers, and wherein said audit identified security exposures are recorded in said data store such that, for an audited Web component, a corresponding set of audit identified security exposures are associated with said one of said structural reference identifiers corresponding to said root element of said audited Web component; and b) a reporting computer subsystem, coupled to said data store, and operative to provide reports of security exposures identified with respect to said target Web site including with respect to said first portion of said first Web page.
地址 Mountain View CA US