摘要 |
The present invention relates to an apparatus for extracting the information in the web page. The apparatus for extracting the web page information, comprises: a rule information learning unit which recognizes location information including a value in the code that specifies the web page as a programming language by using a plurality of items defining the information included in the web page and a information table which stores the value according to the item, and learns the rule information about the value and the location information; and a change sensing unit which compares the extracted value and the value of the information table to determine the change of the web page by using the learned rule information, and requests to relearn the rule information. According to the present invention, when the data are extracted based on the information extraction rule from the web page, there is no need to regenerate the information extraction rule every time when a layout or code is changed. Further, the apparatus automatically sense the style change of the web page, and automatically generates the information extraction rule accordingly, thereby making it possible to continuously generate knowledge information based on the web page. |