摘要 |
PROBLEM TO BE SOLVED: To extract topics included in bulky natural language data. SOLUTION: From text data in a natural language collected from a plurality of answerers, text data included in the text data of two or more answerers and longer than a predetermined length are extracted as entries. A thesaurus database is created to store the extracted entries in association with appropriate categories (superordinate concepts). After the thesaurus database creation, phrases included as entries in the thesaurus database are detected from many text data collected from a plurality of answerers, and the appearance of the phrases is counted as appearance frequencies of the categories including the entries. Correlation coefficients of appearance frequency are calculated among the categories over the answerers' answers. From the correlation coefficient matrix, a factor loading matrix is calculated by a factor analysis and a fishbone diagram is output. COPYRIGHT: (C)2006,JPO&NCIPI
|