发明名称 APPARATUS AND METHOD FOR EXTRACTING DATA FROM WEB PAGE
摘要 The present invention relates to an apparatus for extracting the information in the web page. The apparatus for extracting the web page information, comprises: a rule information learning unit which recognizes location information including a value in the code that specifies the web page as a programming language by using a plurality of items defining the information included in the web page and a information table which stores the value according to the item, and learns the rule information about the value and the location information; and a change sensing unit which compares the extracted value and the value of the information table to determine the change of the web page by using the learned rule information, and requests to relearn the rule information. According to the present invention, when the data are extracted based on the information extraction rule from the web page, there is no need to regenerate the information extraction rule every time when a layout or code is changed. Further, the apparatus automatically sense the style change of the web page, and automatically generates the information extraction rule accordingly, thereby making it possible to continuously generate knowledge information based on the web page.
申请公布号 KR20160066235(A) 申请公布日期 2016.06.10
申请号 KR20140170332 申请日期 2014.12.02
申请人 SALTLUX INC. 发明人 LEE, KYUNG IL;YANG, SUNG KWON;JEONG, KYO SUNG
分类号 G06F17/00 主分类号 G06F17/00
代理机构 代理人
主权项
地址
您可能感兴趣的专利