发明名称 SELECTIVE CONTENT EXTRACTION
摘要 A method for extracting web content includes detecting, within a web page, a hierarchical structure that includes a plurality of nodes. Potential article nodes from the plurality of nodes are identified. The identified potential article node with a highest rank in the hierarchical structure is identified as an article node. Content is extracted from the article node.
申请公布号 WO2011002456(A1) 申请公布日期 2011.01.06
申请号 WO2009US49298 申请日期 2009.06.30
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.;LIU, SAM;JOSHI, PARAG;XIONG, YUHONG;ATKINS, CLAYTON;LIU, JERRY 发明人 LIU, SAM;JOSHI, PARAG;XIONG, YUHONG;ATKINS, CLAYTON;LIU, JERRY
分类号 G06Q50/00;G06F3/048;G06Q30/00 主分类号 G06Q50/00
代理机构 代理人
主权项
地址