发明名称 AUTOMATIC VISUAL SEGMENTATION OF WEBPAGES
摘要 To provide valuable information regarding a webpage, the webpage must be divided into distinct semantically coherent segments for analysis. A set of heuristics allow a segmentation algorithm to identify an optimal number of segments for a given webpage or any portion thereof more accurately. A first heuristic estimates the optimal number of segments for any given webpage or portion thereof. A second heuristic coalesces segments where the number of segments identified far exceeds the optimal number recommended. A third heuristic coalesces segments corresponding to a portion of a webpage with much unused whitespace and little content. A fourth heuristic coalesces segments of nodes that have a recommended number of segments below a certain threshold into segments of other nodes. A fifth heuristic recursively analyzes and splits segments that correspond to webpage portions surpassing a certain threshold portion size.
申请公布号 US2009177959(A1) 申请公布日期 2009.07.09
申请号 US20080971160 申请日期 2008.01.08
申请人 CHAKRABARTI DEEPAYAN;MITAL MANAV RATAN;HAJELA SWAPNIL;VELIPASAOGLU EMRE 发明人 CHAKRABARTI DEEPAYAN;MITAL MANAV RATAN;HAJELA SWAPNIL;VELIPASAOGLU EMRE
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址