摘要 |
A method is provided for parsing a table. The method includes: receiving an input containing the table; finding candidate separators within the table; and determining which candidate separators are at least one of real and spurious by optimizing an objective function over the set of found candidate separators. Suitably, the function measures numerically whether a parse produced by the set of real separators is accurate. The function suitably includes one or more terms that account for multiple aspects of the table including at least two of: quality of candidate separators; coherence of cells within the parse; quality of cells within the parse; coherence of entire rows within the parse; quality of entire rows within the parse; coherence of entire columns within the parse; quality of entire columns within the parse; layout consistency along an axis of the table; and repeatability along the axis of the table.
|