摘要 |
A dimensionality reduction method of generating a reduced dimension matrix data set Dnew of dimension mxk from an original matrix data set D of dimension mxk wherein n>k. The method selects a subset of k columns from a set of n columns in the original data set D where the m rows correspond to observations Ri where i=1, . . . , m and the n columns correspond to attributes Aj where j=1, . . . , n and dij is the data value associated with observation Ri and attribute Aj. The data values in the reduced data set Dnew for each of the selected k attributes is identical to the data values of the corresponding attributes in the original data set. The steps of the method include: for each of the attributes Aj in the original data set D, calculating a value of variance of the data values associated with attribute Aj, where the variance value, Var(Aj), of the attribute Aj is calculated as follows: <math-cwu id="MATH-US-00001"> <number>1</number> <math> <mrow> <mrow> <mrow> <mi>Var</mi> <mo></mo> <mrow> <mo>(</mo> <mi>Aj</mi> <mo>)</mo> </mrow> </mrow> <mo>=</mo> <mrow> <mrow> <mo>[</mo> <mrow> <mn>1</mn> <mo>/</mo> <mi>m</mi> </mrow> <mo>]</mo> </mrow> <mo>*</mo> <mrow> <munderover> <mo>∑</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mo>it</mo> <mstyle> <mtext> </mtext> </mstyle> <mo>it</mo> <msup> <mrow> <mo>(</mo> <mrow> <mi>dij</mi> <mo>-</mo> <mrow> <mi>Mean</mi> <mo></mo> <mrow> <mo>(</mo> <mi>Aj</mi> <mo>)</mo> </mrow> </mrow> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mrow> </mrow> <mo>,</mo> </mrow> </math> <mathematica-file id="MATHEMATICA-00001" file="US20030028541A1-20030206-M00001.NB"/> <image id="EMI-M00001" wi="216.027" he="24.01245" file="US20030028541A1-20030206-M00001.TIF" imf="TIFF" ti="MF"/> </math-cwu> where Mean(Aj) is the mean value of the data values corresponding to attribute Aj; selecting the k attributes having the greatest variance values; and generating the reduced data set Dnew by selecting data values in the original data set D corresponding to the selected k attributes.
|