摘要 |
<p>The invention provides a document segmentation method of deleting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks. According to the invention, terms that occur in an input document are detected (Fig. 1, 11). Each segment is an appropriate sized chunk. Document segment vectors (Fig 1, 14) are generated. Eigenvalues and eigenvectors of a square sum matrix of the document segment vectors are calculated (Fig. 1, 16). The basis vectors consisting a subspace from the eigenvectors are selected to calculate the topic continuity of the document segments. Projected vectors are calculated. Segmentation points of the document are determined based on the continuity of the projected vectors.</p> |