摘要 |
PROBLEM TO BE SOLVED: To make it possible to find out a point where topics are discontinuous in an inputted document and segment into plural blocks. SOLUTION: Terms to appear in the inputted document are detected, the inputted document is segmented into proper units of document segments, a vector of the document segment composed of appearance frequency of terms to appear in the document segments is generated, an intrinsic vector and an intrinsic value of a square sum matrix of the document segment vector are calculated, a base vector to constitute a partial space for determining segmentation of documents is selected from the intrinsic vector, values obtained by projecting each of the document segment vectors to the base vector are calculated and the document is segmented based on these projection values. Singular value decomposition is executed for a set of the document segment vectors and the set of the document segment vectors is developed by the intrinsic vector and the intrinsic value which are made orthogonal to each other. Since the intrinsic vector is expressed by combination of the terms, itself has a concept. The intrinsic value is regarded as intensity or energy of the concept to be expressed by the intrinsic vector.
|