发明名称 A DOCUMENT SEGMENTATION METHOD
摘要 <p>The invention provides a document segmentation method of deleting segmentation points where a topic of an input document is discontinuous before and after the point to divide the document into plural blocks. According to the invention, terms that occur in an input document are detected (Fig. 1, 11). Each segment is an appropriate sized chunk. Document segment vectors (Fig 1, 14) are generated. Eigenvalues and eigenvectors of a square sum matrix of the document segment vectors are calculated (Fig. 1, 16). The basis vectors consisting a subspace from the eigenvectors are selected to calculate the topic continuity of the document segments. Projected vectors are calculated. Segmentation points of the document are determined based on the continuity of the projected vectors.</p>
申请公布号 WO2002048951(A1) 申请公布日期 2002.06.20
申请号 US2001043534 申请日期 2001.11.16
申请人 发明人
分类号 主分类号
代理机构 代理人
主权项
地址