摘要 |
<P>PROBLEM TO BE SOLVED: To provide a document processing device that can assign appropriate semantic tags to various documents. <P>SOLUTION: A general proper expression extraction part 11 and a semantic role word extraction part 12 extract general proper expressions and semantic role words from an input document 100, and a general document structure analysis part 13 computes a basic document structure. A document type identification part 15 selects a document type for the input document by comparing a resultant document model based on the general proper expressions and semantic role words with each of document models based on general proper expressions and semantic role words which are defined for respective document types. A detailed document structure detection part 16 detects substructures of the input document according to information on detailed document structure based on general proper expressions and semantic role words which is defined for the document type. A semantic tag assignment part 17 assigns semantic tags predefined for the detailed document structure to the detected substructures to create an output document 101. <P>COPYRIGHT: (C)2007,JPO&INPIT |