摘要 |
In a method for identifying a table of contents in a document, an ordered sequence of text fragments is derived from the document. A table of contents is selected as a contiguous sub-sequence of the ordered sequence of text fragments satisfying the criteria: (i) entries defined by text fragments of the table of contents each have a link to a target text fragment having textual similarity with the entry; (ii) no target text fragment lies within the table of contents; and (iii) the target text fragments have an ascending ordering corresponding to an ascending ordering of the entries defining the target text fragments. |