摘要 |
PROBLEM TO BE SOLVED: To provide a system, a method and a program for information tracking from heterogeneous information sources. SOLUTION: An information clustering system 100 comprises a data accumulation part 102 for accumulating documents in a document storage part, the documents including loosely correlated clusters between the documents and being time sliced; a vector space generation part 104 for generating document-keyword vectors, the document-keyword vectors comprising sparse numeral values depending on presence of keywords in the documents; a dimension reduction part 106 for reducing dimensions of the keywords to create a dimension reduction matrix of the document-keyword matrix; a centroid vector determination part 108 for generating a centroid vector of the cluster, the cluster being retrieved from the document-keyword vector using a principal component in a same line of the dimension reduction matrix, the centroid vectors being defined from keywords and weight of documents within the cluster; and an item storage part 112 for storing the keywords and the weights of the centroid vector. COPYRIGHT: (C)2008,JPO&INPIT
|