摘要 |
A document processing apparatus comprises a layout analysis module configured to analyze image data input, divide areas for each classification, and acquire coordinate information of a text area from the areas by a classification; a text area information calculation module configured to calculate position information of a partial area for each text area on the basis of the coordinate information acquired by the layout analysis module; a feature extraction module configured to extract features of the text area on the basis of the position information calculated by the text area information calculation module; an analysis executing module configured to analyze semantic information of the partial area using a plurality of kinds of analysis component modules; and a component formation module configured to select and construct one or a plurality of analysis component modules on the basis of the features of the text area extracted by the feature extraction module and permit the analysis executing module to execute analysis of the semantic information of the partial area according to the one or plurality of analysis components modules contracted.
|