摘要 |
PROBLEM TO BE SOLVED: To provide a device for automatically classifying sentences into the clusters of the sentences based on the appearing frequency of a word in the sentence. SOLUTION: A statistic processing part 1 inputs plural sentences and prepares a matrix composed of the appearing frequency of a word in the sentences and a sentence classification part 2 inputs the matrix from the statistic processing part 1, classifies the sentences into the clusters of the sentences based on data in the matrix and outputs the result. The automatic classification problem of the sentences is captured as the estimation problem of a probability model defined on the direct product of the division of a word set and a sentence set, it is assumed that the respective sentences are generated from the belonging cluster in certain probability and the respective words are generated from the belonging cluster in certain probability in the probability model and the probability mode is selected by using an information amount standard. Clustering is alternately performed to the sentence set and the word set in a bottom-up manner.
|