摘要 |
An input document is matched with predetermined patterns on a line-by-line basis, whereby it can be assigned a plurality of pairs of attributes and costs. When the process for the whole document is completed, in accordance with a rule specifying the combination of attributes between the adjacent lines, the nodes of a graph are generated, the nodes are linked with each other, and costs are given to the node and links. There is a plurality of paths for traveling the graph from the root node to the final node, and each of them means the interpretation of a possible logical structure of the document. By summing the costs for the traveled nodes and links, a total cost value can be associated with each path, and by prioritizing by this total cost value, a plurality of logical structure interpretations can be sequentially shown from the most plausible path (logical structure interpretation). A chosen logical structure is tagged as required.
|