摘要 |
Some embodiments provide a method for analyzing an unstructured document that includes a number of glyphs, each of which has a position in the unstructured document. Based on positions of the glyphs in the unstructured document, the method creates associations between different sets of glyphs in order to identify different sets of glyphs as different words. The method creates associations between different sets of words in order to identify different sets of words as different paragraphs. The method defines associations between paragraphs that are not contiguous in order to define a reading order for the paragraphs. |