摘要 |
PROBLEM TO BE SOLVED: To provide a method and an apparatus for extracting Web page information which can be applied to almost all kinds of Web pages. SOLUTION: The information block extraction apparatus uses a processing unit to further precise accuracy to automatically induce rules for extracting information blocks within a Web page 101. Specifically, automatic repeated-pattern discovery at a structural level and clustering at a semantic level are the foundation of the invention, and they guarantee the present invention. COPYRIGHT: (C)2005,JPO&NCIPI
|