发明名称 Automatic detection of item lists within a web page
摘要 Embodiments of the invention relate to detecting items lists. In one embodiment, a web browsing interaction history of a user associated with a given web page is analyzed. The web browsing interaction history indicates that the user interacted with at least one element of the web page. A document object model (DOM) of the given web page is constructed. A node within the DOM corresponding to the element in the web page is identified based on analyzing the web browsing interaction history. An ancestor node of the node that corresponds to an item list within the web page comprising the element is identified based on at least a distribution of child tags of the ancestor node.
申请公布号 US9251287(B2) 申请公布日期 2016.02.02
申请号 US201113218686 申请日期 2011.08.26
申请人 International Business Machines Corporation 发明人 Mahmud Jalal U.
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 Fleit Gibbons Gutman Bongini & Bianco PL 代理人 Fleit Gibbons Gutman Bongini & Bianco PL ;Grzesik Thomas
主权项 1. A computer program product comprising a non-transitory computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising computer readable program code configured to: analyze a web browsing interaction history of a user associated with a given web page, the web browsing interaction history indicating that the user interacted with at least one element of the given web page; construct a document object model (DOM) of the given web page; identify, based on the analyzed web browsing interaction history, a node within the DOM corresponding to the at least one element in the given web page; identify an ancestor node of the node; determine a distribution of node tag types for a set of child nodes associated with the identified ancestor node, and determine that the identified ancestor node comprises an item list including the at least one element within the given web page based on at least the distribution satisfying a given threshold, wherein each item in the item list corresponds to a child node in the set of child nodes, wherein determining that the identified ancestor node comprises an item list further comprises: determining, based on the distribution, a number of matching child tags associated with each of the set of child nodes, wherein each of the matching child tags is a structure based tag, and wherein a structure based tag affects a structure of a web page;determining if the number of matching child tags satisfies the given threshold;based on at least the number of matching child tags satisfying the given threshold, identifying the identified ancestor node as a candidate node comprising the item list; andbased on at least the number of matching child tags failing to satisfy the given threshold, determining that the identified ancestor node comprises the item list.
地址 Armonk NY US