发明名称 SYSTEM AND METHOD FOR EXTRACTING DOMAIN INFORMATION IN UNSTRUCTURED WEB DOCUMENTS
摘要 A system and a method for extracting information from unstructured web documents by each domain are provided to automatically extract the important information from domain web documents by learning an information extraction rule from the unstructured web documents including unstructured data divided into each environment in a ubiquitous environment and using the learned rule. A learning part(20) generates the extraction rule including no semantic ambiguity by using ontology for each property/domain of the unstructured web documents including the unstructured data divided into each domain. A rule database(40) stores the learned extraction rule. An information extractor(60) separates words having linguistic meaning by extracting content from the inputted documents of the target domain, replaces the word having the dual semantic ambiguity with a representative word, and automatically extracts the important information by finding the semantic words with the extraction rule. An information extraction result output part(70) fixes the final information by confirming/outputting the extracted information.
申请公布号 KR20070008994(A) 申请公布日期 2007.01.18
申请号 KR20050063896 申请日期 2005.07.14
申请人 KT CORPORATION;SALTLUX;SEARCH CAST CO., LTD.;INSTITUTE INFORMATION TECHNOLOGY ASSESSMENT;KTFREETEL CO., LTD.;CHUNG-ANG UNIVERSITY INDUSTRY-ACADEMY COOPERATIONFOUNDATION 发明人 AHN, TAE SUNG;IVAN BERLOCHER;JUNG, YONG IL;JEON, HO HYUN
分类号 G06F17/00;G06F17/26 主分类号 G06F17/00
代理机构 代理人
主权项
地址