发明名称 METHOD AND SYSTEM FOR SAMPLING WEB DOCUMENTS INFORMATION
摘要 A method for extracting the structural information of a web document and a system thereof are provided to extract the attributes of a corresponding web site automatically although the learning is performed by tagging a small amount of data in the web site. By learning a web document, in which attributes are manually tagged and which is collected from a web site to be extracted, an attribute learning processing device(100) generates a studying model. An attribute extraction processing device(200) extracts attributes from an original web document. A boundary recognition studying model database(300) provides a database of a boundary recognition studying model among the studying models to the attribute extraction processing device. An attribute recognition studying model database(302) provides a database of the attribute recognition studying model among the studying models to the attribute extraction processing device.
申请公布号 KR20090061525(A) 申请公布日期 2009.06.16
申请号 KR20070128556 申请日期 2007.12.11
申请人 ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE 发明人 WANG, JI HYUN;LEE, CHANG KI;CHOI, MI RAN;JANG, MYUNG GIL
分类号 G06F17/21 主分类号 G06F17/21
代理机构 代理人
主权项
地址