摘要 |
In the provided document clustering system (100), a concept tree structure accumulation unit (11) stores a concept tree structure that represents a hierarchical relationship among concepts represented by each of a plurality of words. For any two words, a concept similarity computation unit (12) obtains a concept similarity, which is an index indicating how close the concepts represented by the two words are. Using concept similarities for words that appear in two documents in a document set, an inter-document similarity computation unit (13) obtains an inter-document similarity, which indicates how similar the two documents are semantically. A clustering unit (14) uses inter-document similarities to cluster the documents in the document set. |