摘要 |
A document analysis system which can execute a layout analysis intended by a document provider and an exhaustive title analysis and output the analysis result which can be used by a third person is provided by the present invention. The input unit ( 11 ) obtains a structured or semi-structured document and renders it. The basic layout analysis unit ( 14 ) obtains the rendering result and analyzes the layout by grouping document description elements juxtaposed in a determined direction by referencing an arrangement of the document description elements. The title analysis unit ( 15 ) obtains the rendering result and a title analysis rule from the title analysis rule storing unit ( 23 ) and analyzes the title by comparing the name, attribute, style or the content of the document analysis elements with the title analysis rule. The layout analysis unit ( 16 ) obtains the layout components and the hierarchical relationship thereof and the titles for generating a new layout by grouping the layout components. The output unit ( 13 ) obtains the layout components and the hierarchical relationship thereof, the relationship between the components and the titles, shapes them into a format having an expression which uses the reference to the document description elements and output them.
|