摘要 |
In a method for identifying a table of contents in a document (10) , text fragments are extracted (12) from the document. There are identified (20, 30, 34, 38) : (i) a substantially contiguous group of text fragments as table of content entries and (ii) a different group of text fragments as linked text fragments linked with corresponding table of content entries. During the identifying, a number of text fragments that are candidates for identification as linked text fragments is reduced based on at least one reduction criterion (130) . The identified table of contents entries and linked text fragments (110) are validated based on at least one validation criterion (162) related to distribution of the linked text fragments. |