摘要 |
A system (10) and method (100) for identifying critical features (211) in an ordered scale space within a multi-dimensional feature space is described. Features (173) are extracted from a plurality of data collections (76). Each data collection (76) is characterized by a collection of features (173) semantically-related by a grammar. Each feature (173) is normalized and frequencies (183) of occurrence and co-occurrences (78) for the feature (173)are determined. The occurrence frequencies (183) and the co-occurrence frequencies (78) for each of the features (173) are mapped into a set of patterns of occurrence frequencies (183) and a set of patterns of co-occurrence frequencies (79). The pattern for each data collection (76) is selected and distance measures between each occurrence frequency (183) in the selected pattern is calculated. The occurrence frequencies (183) are projected onto a one-dimensional document signal (81) in order of relative decreasing similarity using the similarity measures. Wavelet and scaling coefficients (81) are derived from the one-dimensional document signal using multiresolution analysis. |