发明名称 METHOD FOR EVALUATING COMMONALITY OF DOCUMENT
摘要 PROBLEM TO BE SOLVED: To solve the problem that a scale showing how much subjects are made common in three or above documents in a natural language processing is not known yet and that extraction of the document where the common subject is described from a set of documents whose subjects are not necessarily the same and bestowal of a score corresponding to closeness to the common subject to the respective documents and respective sentences are not perfect in conventional clustering technology. SOLUTION: The respective sentences are represented by a binary vector indicating presence or absence of a term to which respective components correspond, and a concept of the common vector among the documents is introduced. In the common vector, only the component that becomes "1" in all the vectors becomes "1" and others become "0" in a group of sentence vectors taken from the respective documents one by one. A common degree of the document set is obtained by using a sum or a square sum of the number of components whose values in the respective common vectors are not zero with respect to the whole common vectors. The respective sentences are projected on the whole common vectors and it is solved how much the respective sentences are close to the common subject in accordance with the sum of projection values and the like. COPYRIGHT: (C)2004,JPO
申请公布号 JP2004164036(A) 申请公布日期 2004.06.10
申请号 JP20020326157 申请日期 2002.11.08
申请人 HEWLETT PACKARD CO <HP> 发明人 KAWATANI TAKAHIKO
分类号 G06F17/00;G06F17/30;(IPC1-7):G06F17/30 主分类号 G06F17/00
代理机构 代理人
主权项
地址