发明名称 Method of reducing dimensionality of a set of attributes used to characterize a sparse data set
摘要 A dimensionality reduction method of generating a reduced dimension matrix data set Dnew of dimension mxk from an original matrix data set D of dimension mxk wherein n>k. The method selects a subset of k columns from a set of n columns in the original data set D where the m rows correspond to observations Ri where i=1, . . . , m and the n columns correspond to attributes Aj where j=1, . . . , n and dij is the data value associated with observation Ri and attribute Aj. The data values in the reduced data set Dnew for each of the selected k attributes is identical to the data values of the corresponding attributes in the original data set. The steps of the method include: for each of the attributes Aj in the original data set D, calculating a value of variance of the data values associated with attribute Aj, where the variance value, Var(Aj), of the attribute Aj is calculated as follows: <math-cwu id="MATH-US-00001"> <number>1</number> <math> <mrow> <mrow> <mrow> <mi>Var</mi> <mo>⁡</mo> <mrow> <mo>(</mo> <mi>Aj</mi> <mo>)</mo> </mrow> </mrow> <mo>=</mo> <mrow> <mrow> <mo>[</mo> <mrow> <mn>1</mn> <mo>/</mo> <mi>m</mi> </mrow> <mo>]</mo> </mrow> <mo>*</mo> <mrow> <munderover> <mo>∑</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <mo>it</mo> <mstyle> <mtext> </mtext> </mstyle> <mo>it</mo> <msup> <mrow> <mo>(</mo> <mrow> <mi>dij</mi> <mo>-</mo> <mrow> <mi>Mean</mi> <mo>⁡</mo> <mrow> <mo>(</mo> <mi>Aj</mi> <mo>)</mo> </mrow> </mrow> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </mrow> </mrow> <mo>,</mo> </mrow> </math> <mathematica-file id="MATHEMATICA-00001" file="US20030028541A1-20030206-M00001.NB"/> <image id="EMI-M00001" wi="216.027" he="24.01245" file="US20030028541A1-20030206-M00001.TIF" imf="TIFF" ti="MF"/> </math-cwu> where Mean(Aj) is the mean value of the data values corresponding to attribute Aj; selecting the k attributes having the greatest variance values; and generating the reduced data set Dnew by selecting data values in the original data set D corresponding to the selected k attributes.
申请公布号 US2003028541(A1) 申请公布日期 2003.02.06
申请号 US20010876321 申请日期 2001.06.07
申请人 MICROSOFT CORPORATION 发明人 BRADLEY PAUL S.;ACHLIOPTAS DEMETRIOS;FALOUTSOS CHRISTOS;FAYYAD USAMA
分类号 G06F1/10;G06F7/00;G06F17/30;G06K9/62;(IPC1-7):G06F7/00 主分类号 G06F1/10
代理机构 代理人
主权项
地址