摘要 |
<p>A method for data mining on compressed data vectors by a certain metric being expressible as a function of the Euclidean distance is suggested. In a first step, for each compressed data vector, positions and values of such coefficients having the largest energy in the compressed data vector are stored. In a second step, for each compressed data vector, the coefficients not having the largest energy in the compressed data vector are discarded. In a third step, for each compressed data vector, a compression error is determined in dependence on the discarded coefficients in the compressed data vector. In a fourth step, at least one of an upper and a lower bound for the certain metric is retrieved in dependence on the stored positions and the stored values of the coefficients having the largest energy and the determined compression errors. The metric may be embodied as a Euclidean distance, a correlation or a cosine similarity. The compressed vectors may be generated by a lossy compression transformation having an orthonormal basis such as discrete Fourier transform by principle component analysis by Chebyshev polynomials or by wavelets. The upper and lower bounds of the metric may be retrieved by a double water-filling algorithm.</p> |