摘要 |
<p>A method is provided for document representation and document analysis including extraction of important sentences from a given document or determining similarity between two documents. The method detects terms (11) that occur in the document, segments the document into document segments, each segment being an appropriate sized chunk and generates document segment vectors (14) where each vector includes as its element values according to occurrence frequencies of the terms occurring in the document segments. The method calculates the eigenvalues and eigenvectors (16) of a square sum matrix (15) in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used to determine the importance (19). A weighted sum of the squared projections of the respective selected eigenvectors is calculated.</p> |