发明名称 DOCUMENT SEGMENTATION METHOD
摘要 PROBLEM TO BE SOLVED: To make it possible to find out a point where topics are discontinuous in an inputted document and segment into plural blocks. SOLUTION: Terms to appear in the inputted document are detected, the inputted document is segmented into proper units of document segments, a vector of the document segment composed of appearance frequency of terms to appear in the document segments is generated, an intrinsic vector and an intrinsic value of a square sum matrix of the document segment vector are calculated, a base vector to constitute a partial space for determining segmentation of documents is selected from the intrinsic vector, values obtained by projecting each of the document segment vectors to the base vector are calculated and the document is segmented based on these projection values. Singular value decomposition is executed for a set of the document segment vectors and the set of the document segment vectors is developed by the intrinsic vector and the intrinsic value which are made orthogonal to each other. Since the intrinsic vector is expressed by combination of the terms, itself has a concept. The intrinsic value is regarded as intensity or energy of the concept to be expressed by the intrinsic vector.
申请公布号 JP2002197083(A) 申请公布日期 2002.07.12
申请号 JP20000378015 申请日期 2000.12.12
申请人 HEWLETT PACKARD CO <HP> 发明人 KAWATANI TAKAHIKO
分类号 G06F17/27;(IPC1-7):G06F17/27 主分类号 G06F17/27
代理机构 代理人
主权项
地址