发明名称 |
UNSUPERVISED DETECTION AND CATEGORIZATION OF WORD CLUSTERS IN TEXT DATA |
摘要 |
A device for categorizing data sets obtained from a number of sources comprises a symbol frequency determining unit (24) that determines the frequency of appearance of symbols in a first collection of data sets and the frequency of appearance of symbols in a second collection of data sets, a significance determining unit (26) that determines the most significant symbols for the second collection based on the frequency of appearance in the first collection and the frequency of appearance in the second collection,a grouping unit (28) that groups the most significant symbols into groups according to their appearance in the same data set and a ranking unit (30) that ranks the data sets in relation to the symbol groups according to a ranking scheme. |
申请公布号 |
WO2013072258(A1) |
申请公布日期 |
2013.05.23 |
申请号 |
WO2012EP72290 |
申请日期 |
2012.11.09 |
申请人 |
KAIROS FUTURE GROUP AB |
发明人 |
LARSSON, TOMAS;LINDGREN, MATS |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
|
地址 |
|