摘要 |
A device for categorizing data sets obtained from a number of sources comprises a symbol frequency determining unit (24) that determines the frequency of appearance of symbols in a first collection of data sets and the frequency of appearance of symbols in a second collection of data sets, a significance determining unit (26) that determines the most significant symbols for the second collection based on the frequency of appearance in the first collection and the frequency of appearance in the second collection, a grouping unit (28) that groups the most significant symbols into groups according to their appearance in the same data set and a ranking unit (30) that ranks the data sets in relation to the symbol groups according to a ranking scheme. |