摘要 |
PURPOSE:To precisely segment a document which is described at a top part and a graph part by the sentence segmentation device which segments the document into sentences. CONSTITUTION:An original text is stored in a storage means 20 by an input means 10. A layout analyzing means 30 extracts the top part and graph part from the original text stored in the storage means 20. A text extracting means 40 extracts texts present at the top part and graph part for every unit which can be regarded as one sentence. An analyzing means 50 takes morpheme analysis and a syntax analysis for every unit extracted by the text extracting means 40 and finds the connection cost between lines of the units which can be regarded as one sentence and whether or not modification exists, is performed. A decision means 60 decides whether or not the lines or the units are successive on the basis of the connection cost between the lines and whether or not the modification exists, is performed. A sentence dividing and combining means 70 segments the units which can be regarded as one sentence on the basis of the decision result of the decision means 60. |