摘要 |
Embodiments herein present a method for a back-tracking decision tree classifier for a large reference data set. The method analyzes first data files having a higher usage than second data files and identifies file attribute sets that are common in the first data files. Next, the method associates associated qualifiers with each of the file attribute sets, wherein each of the associated qualifiers represents a corresponding first data file. The associated qualifiers are then counted to determine the number of associated qualifiers that are associated with each of the file attribute sets. Subsequently, the file attribute sets are sorted in descending order based on the number of associated qualifiers. The counting and sorting are initially performed on file attribute sets that only have a single file attribute. |