摘要 |
<p>A hypertext documet (2) is analysed by identifying document elements within it and then categorising those document elements to given element types. Heuristic pattern matching is then performed upon the categorised element types to identify patterns indicative of different document regions. The original document may then be divided into separate documents based upon the identified document portions.</p> |