摘要 |
PROBLEM TO BE SOLVED: To provide a method for analysis for deciding the similarity between two documents to extract important sentences from given documents and a representing method for a document. SOLUTION: An input document is sectioned into document segments as proper units, a vector of a document segment having as a component a value corresponding to the appearance frequency of a term appearing in the document segment is generated, and characteristic values and characteristic vectors of a square sum matrix of the document segment are used to represent a set of document segment vectors. The characteristic vectors and characteristic values of a square sum matrix wherein the rank of the document segment vector is represented as R are found and multiple L characteristic vectors to be used for importance decision making are selected from characteristic vectors; and the weighted square sums of projection values of respective document segment vectors to the selected characteristic vectors are found and a document segment having large importance is selected according to the square sums of the projection values of the document segment vectors.
|