发明名称 METHOD AND VECTOR ANALYSIS FOR A DOCUMENT
摘要 The invention provides a document representation method and a document analysis method including extraction of important sentences from a given document and/or determination of similarity between two documents. The inventive method detects terms that occur in the input document, segments the input document into document segments, each segment being an appropriately sized chunk and generates document segment vectors, each vector including as its element values according to occurrence frequencies of the terms occurring in the document segments. The method further calculates eigenvalues and eigenvectors of a square sum matrix in which a rank of the respective document segment vector is represented by R and selects from the eigenvectors a plural (L) of eigenvectors to be used for determining the importance. Then, the method calculates a weighted sum of the squared projections of the respective document segment vectors onto the respective selected eigenvectors and selects document segments having the significant importance based on the calculated weighted sum of the squared projections of the respective document segment vectors.
申请公布号 US2009216759(A1) 申请公布日期 2009.08.27
申请号 US20090424801 申请日期 2009.04.16
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. 发明人 KAWATANI TAKAHIKO
分类号 G06F17/30;G06F7/38 主分类号 G06F17/30
代理机构 代理人
主权项
地址