摘要 |
This disclosure provides an exemplary method and system for extracting structured data from an unstructured textual document. According to an exemplary method, initially a layout analysis is performed resulting in one or more alternatives for grouping and ordering the page elements of interest. Next, the content of these page elements are tagged based on application-specific heuristics. Finally, a sequence-based method is applied to the tags for identifying repetitive contiguous patterns. |