摘要 |
A thematic classification method for a digital text file from an encyclopedic database comprising a category graph. A thematic classification model is developed during a learning phase. For each category node, all articles directly linked to the category node is grouped to obtain, for each category node, a “bag of words.” A term-frequency vector characteristic of the category node is determined. At each category node the term-frequency vector, directly connected thereto, with term-frequency vectors of more specific nodes are combined. During the production phase, the term-frequency vector of the digital text file is calculated. N category nodes in the thematic classification model having the closest term-frequency vectors to the term-frequency of the digital text file are selected. |