摘要 |
<P>PROBLEM TO BE SOLVED: To provide a document dividing apparatus capable of dividing a document in a form corresponding to contents intended by a creator of the document, a document processing system and a program. <P>SOLUTION: A document dividing apparatus is provided which is characterized in having: section extraction means for acquiring a document file in which text data are described, detecting partition information of the text data and extracting a plurality of sections from the text data; phrase importance degree calculation means for calculating an importance degree for a phrase of the text data; weight determination means for extracting layout information of the phrase from the document file and reading weight information corresponding to the layout information from weight information storage means; feature vector creation means for creating for each section a feature vector with a value generated from the importance degree and the weight information as a coefficient; and group extraction means for extracting the plurality of sections as information as one group in accordance with similarity of feature vectors of the sections. <P>COPYRIGHT: (C)2012,JPO&INPIT |