发明名称 |
METHOD AND SYSTEM FOR SAMPLING WEB DOCUMENTS INFORMATION |
摘要 |
A method for extracting the structural information of a web document and a system thereof are provided to extract the attributes of a corresponding web site automatically although the learning is performed by tagging a small amount of data in the web site. By learning a web document, in which attributes are manually tagged and which is collected from a web site to be extracted, an attribute learning processing device(100) generates a studying model. An attribute extraction processing device(200) extracts attributes from an original web document. A boundary recognition studying model database(300) provides a database of a boundary recognition studying model among the studying models to the attribute extraction processing device. An attribute recognition studying model database(302) provides a database of the attribute recognition studying model among the studying models to the attribute extraction processing device.
|
申请公布号 |
KR20090061525(A) |
申请公布日期 |
2009.06.16 |
申请号 |
KR20070128556 |
申请日期 |
2007.12.11 |
申请人 |
ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE |
发明人 |
WANG, JI HYUN;LEE, CHANG KI;CHOI, MI RAN;JANG, MYUNG GIL |
分类号 |
G06F17/21 |
主分类号 |
G06F17/21 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|