发明名称 INFORMATION BLOCK EXTRACTION APPARATUS AND METHOD FOR WEB PAGE
摘要 PROBLEM TO BE SOLVED: To provide a method and an apparatus for extracting Web page information which can be applied to almost all kinds of Web pages. SOLUTION: The information block extraction apparatus uses a processing unit to further precise accuracy to automatically induce rules for extracting information blocks within a Web page 101. Specifically, automatic repeated-pattern discovery at a structural level and clustering at a semantic level are the foundation of the invention, and they guarantee the present invention. COPYRIGHT: (C)2005,JPO&NCIPI
申请公布号 JP2005092889(A) 申请公布日期 2005.04.07
申请号 JP20040272471 申请日期 2004.09.17
申请人 FUJITSU LTD;NANJING UNIV 发明人 O TAKASHI;WANG JICHENG;WU GANGSHAN;TSUDA HIROSHI
分类号 G06F17/30;G06F12/00;G06F17/00;(IPC1-7):G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址