摘要 |
A method and a system for automatically extracting semantic information from a web document for a semantic web annotation are provided to accelerate semantic and automatic tasks of large capacity web. A system for automatically extracting semantic information from a web document comprises a learning data generator(100), an integrated classifier generator(400) and a semantic information extractor(800). The learning data generator collects large capacity web documents, eliminates HTML tags from the collected web documents, disjoints compound words, and generates learning data to which semantic tags are attached via a learning data editor. The integrated classifier generator generates a support vector machine(200) and a Bayesian classifier by using the learning data, and integrates the support vector machine with the Bayesian classifier. The semantic information extractor automatically extracts semantic information from new web documents via the integrated classifier, and generates the semantic information as ontology instances.
|