摘要 |
A text structure analysis method and apparatus in which the apparatus includes a content boundary pattern storage device for storing content boundary patterns indicating boundaries of various contents represented as collections of given contents of text and a text analysis device for detecting boundary sections present in the input text based on the contents stored in the content boundary pattern storage device. The text analysis device establishes content boundaries for those detected boundary sections. When extracted from the input text as contents for each collection of contents of that text, the content boundary patterns indicating the boundaries of the various contents are detected, content boundaries are established for that input text, and the text is treated in units of content for each collection of content based on the established content boundaries.
|