发明名称 Method and device for data mining on compressed data vectors
摘要 <p>A method for data mining on compressed data vectors by a certain metric being expressible as a function of the Euclidean distance is suggested. In a first step, for each compressed data vector, positions and values of such coefficients having the largest energy in the compressed data vector are stored. In a second step, for each compressed data vector, the coefficients not having the largest energy in the compressed data vector are discarded. In a third step, for each compressed data vector, a compression error is determined in dependence on the discarded coefficients in the compressed data vector. In a fourth step, at least one of an upper and a lower bound for the certain metric is retrieved in dependence on the stored positions and the stored values of the coefficients having the largest energy and the determined compression errors. The metric may be embodied as a Euclidean distance, a correlation or a cosine similarity. The compressed vectors may be generated by a lossy compression transformation having an orthonormal basis such as discrete Fourier transform by principle component analysis by Chebyshev polynomials or by wavelets. The upper and lower bounds of the metric may be retrieved by a double water-filling algorithm.</p>
申请公布号 GB201207453(D0) 申请公布日期 2012.06.13
申请号 GB20120007453 申请日期 2012.04.26
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人
分类号 主分类号
代理机构 代理人
主权项
地址