发明名称 METHOD FOR AUTOMATIC THEMATIC CLASSIFICATION OF A DIGITAL TEXT FILE
摘要 A thematic classification method for a digital text file from an encyclopedic database comprising a category graph. A thematic classification model is developed during a learning phase. For each category node, all articles directly linked to the category node is grouped to obtain, for each category node, a “bag of words.” A term-frequency vector characteristic of the category node is determined. At each category node the term-frequency vector, directly connected thereto, with term-frequency vectors of more specific nodes are combined. During the production phase, the term-frequency vector of the digital text file is calculated. N category nodes in the thematic classification model having the closest term-frequency vectors to the term-frequency of the digital text file are selected.
申请公布号 US2016140220(A1) 申请公布日期 2016.05.19
申请号 US201414898141 申请日期 2014.06.04
申请人 PROXEM 发明人 CHAUMARTIN FRANÇOIS-RÉGIS
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址 Paris FR