基于视觉的web页面萃取方法,申请号CN201110171536.6-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	基于视觉的web页面萃取方法
摘要	本发明公开了一种基于视觉的web页面萃取方法，包括如下步骤：⑴确认web页面中给定的网页地址已经下载完成，并且经完整展示后生成文档对象模型树；⑵基于文档对象模型树将web页面拆分成在视觉上无法进一步拆分的块元素；⑶从主体块在文档对象模型树中对应的根节点出发，遍历各个视觉块在文档对象模型树中对应的块节点，从而获得web页面中的有价值数据。本方法能够充分使用web页面本身的视觉提示，并结合文档对象模型树进行页面语义分块，显著提高了web页面萃取的效率和质量。
申请公布号	CN102253979B	申请公布日期	2013.07.24
申请号	CN201110171536.6	申请日期	2011.06.23
申请人	天津海量信息技术有限公司	发明人	王东胜
分类号	G06F17/30(2006.01)I	主分类号	G06F17/30(2006.01)I
代理机构	北京汲智翼成知识产权代理事务所(普通合伙) 11381	代理人	陈曦
主权项	一种基于视觉的web页面萃取方法，其特征在于包括如下步骤：⑴确认web页面中给定的网页地址已经下载完成，并且经完整展示后生成文档对象模型树；⑵基于所述文档对象模型树将web页面拆分成在视觉上无法进一步拆分的块元素；⑶从主体块在所述文档对象模型树中对应的根节点出发，遍历各个视觉块在所述文档对象模型树中对应的块节点，从而获得所述web页面中的有价值数据；所述视觉块是在所述块元素的基础上，经过相似块合并、逻辑块合并之后生成的。
地址	300384 天津市南开区华苑产业区榕苑路1号B北322-323室

您可能感兴趣的专利

Intermediate electrical connector

Reduction method of successive hard handoffs between base stations in code division multiple access (CDMA) mobile communication system

Temperature compensation circuit for semiconductor switch and method of operation thereof

Methods to block IGE binding to cell surface receptors of mast cells

Apparatus for power factor control

Enhancement of the specificity of nucleic acid amplification by carrier nucleic acid

Biological sampling and storage container utilizing a desiccant

Constant current regulator using IGBT control

Use of CO2-soluble materials as transient coatings

Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer

Method for forming a metal capacitor

Apparatus and method for removing a polishing pad from a platen

Radio communications receiver and method of recovering data from radio signals

Graphic layout compaction system capable of compacting a layout at once

Laser pointing nunchaku assembly

Cooking device and a method for individually guiding a cooking process

Molybdenum disilicide composites

Electrical equipment housing safety interlock system

Polymeric dicyclopentadiene/limonene resin

Heat sink devices for use in electronic devices