发明名称 EXTRACTION OF CONTENT FROM A WEB PAGE
摘要 <p>A system and method are provided for extracting main content from a web page. Web page segmentation is performed on a web page to provide affinity-grouped segments. Descriptive features of at least one of the affinity- grouped segments are computed. At least one of the affinity-grouped segments is classified as a main body segment based on the computed descriptive features. Additional affinity-grouped segments are classified as to a document function based on the computed descriptive features. Classified affinity-grouped segments are assembled according to their classified document functions to provide the main content.</p>
申请公布号 WO2012055067(A1) 申请公布日期 2012.05.03
申请号 WO2010CN01698 申请日期 2010.10.26
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;LI, SUKHWAN;JIN, JIANMING;ZHENG, LIWEI;FAN, JIAN;O'BRIEN-STRAIN, EAMONN;JOSHI, PARAG 发明人 LI, SUKHWAN;JIN, JIANMING;ZHENG, LIWEI;FAN, JIAN;O'BRIEN-STRAIN, EAMONN;JOSHI, PARAG
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址