Information block extraction apparatus and method for Web pages,申请号US20040943157-传众专利搜索

首页产品黄页商标征信

会员服务注册登录

法人/股东/高管

发明名称	Information block extraction apparatus and method for Web pages
摘要	A method and apparatus for identifying coherent areas within a Web page. First, a Web page is parsed into an HTML DOM tree and an HTML tag token stream. Next, repeated-patterns are induced from the Web page. After filtering out improper repeated-patterns and generating corresponding instances of the repeated-patterns, the repeated-patterns are mapped back to corresponding regions in the Web page. Based on the mappings, a hierarchical RST tree containing information blocks is generated. Information items within the information blocks are detected then used to generate a hierarchical structural information block tree. Information blocks from the structural information block tree are then classified into text information blocks and link information blocks. Based on the classification and block semantic similarity, the bocks are clustered then grouped into semantic information blocks. The semantic information blocks contain main text information blocks and related link blocks which, if necessary, can be labeled.
申请公布号	US2005066269(A1)	申请公布日期	2005.03.24
申请号	US20040943157	申请日期	2004.09.17
申请人	FUJITSU LIMITED;NANJING UNIVERSITY	发明人	WANG JUN;WANG JICHENG;WU GANGSHAN;TSUDA HIROSHI
分类号	G06F17/30;G06F12/00;G06F17/00;(IPC1-7):G06F17/00	主分类号	G06F17/30
代理机构		代理人
主权项
地址

您可能感兴趣的专利

Vorrichtung zum Bohren mit verbesserter Späneabfuhr.

Derivatisierte DTPA-Komplexe, diese Verbindungen enthaltende pharmazeutische Mittel, ihre Verwendung und Verfahren zu deren Herstellung

Kraftstofführung im Zylindergehäuse einer Brennkraftmaschine und Verfahren zur Herstellung dieser Kraftstofführung

Wendeschneidplatte für ein Bohrfräswerkzeug

SCHERKRAFTÜBERTRAGER FÜR TINTENSTRAHLSYSTEME.

Kontaktvorrichtung

Absorptionsmittel für saure Gase und Verfahren zu seiner Herstellung.

Revolution rate transducer for vehicle

Equipment for continuous packing of continuously fed objects

Aluminium@-alloy bracket for use in dentistry

RAY-CURED PLASTIC-BAND AND PROCESS FOR MAKING ITS

Steam turbine with facility to disconnect part of steam mass flow

Furnace ash discharge system

Filter-ventilator with intermediate casing for fitting into switchgear cubicle wall

ADJUSTABLE AUTOMOBILE PEDAL SYSTEM

PARTICLES FOR NMR IMAGING AND METHOD OF MANUFACTURE

FIRE RESISTANT GLASS

Use of bromelain as CD44 surface molecule modifier

Grain-orientated electro-steel sheets with good properties

New linear poly:cyclic benzene cpd(s).