摘要 |
Methods are disclosed for recovering or determining logical structure of a document by assessing different combinations of vertical and horizontal cuts across a block of the document. The block is segmented using a scoring function that discards horizontal cuts in favor of vertical cuts shared among neighboring sub-blocks. The order in which the blocks and sub-blocks are segmented is then used to define the logical structure of the document, such as its reading order.
|