主权项 |
1. A method for blind separation of nonnegative correlated pure components from smaller number of nonlinear mixtures mass spectra by using robust principal component analysis, trimmed thresholding, hard thresholding and soft thresholding for preprocessing of experimental data matrix of mixtures mass spectra; empirical kernel map-based nonlinear mapping of preprocessed matrices onto reproducible kernel Hilbert space, sparseness and nonnegativity constrained factorization of mapped matrices, correlation of separated components with the reference components from the library and assignment of the separated components to the pure components from the library using maximal correlation criterion, comprising the following steps:
recording and storing the mixtures data X, where X is nonnegative data matrix comprised of N≧2 rows that correspond to mixture mass spectra and R columns that correspond to observations at different mass-to-charge (m/z) ratios, scaling the mixture data matrix by maximal element of X, xmax:
X=X/xmax [I] that yields new data matrix X such that 0≦xnr≦1, n=1, . . . , N, and r=1, . . . , R, representing scaled mixture data matrix in [I] by nonlinear mixture model:
X=f(S) [II] where S stands for an unknown nonnegative matrix comprised of M>N rows {sm}m=1M that correspond with pure components mass spectra and R columns that correspond with observations at different m/z ratios; f(S) implies that nonlinear mapping is performed column-wise: xr=f(sr) r=1, . . . , R, whereas f(sr)=[ƒ(sr) . . . ƒN(sr)]T and {ƒn: R0+M→R0+}n-1B. Scaling [I] implies that 0≦smr≦1, m=1, . . . , M and r=1, . . . , R, using mixed state probabilistic model for the amplitudes of the pure components mass spectra smr:
p(smr)=ρmδ(smr)+(1−ρm)δ*(smr)ƒ(smr) [III] where δ(smr) is an indicator function and δ*(smt)=1−δ(smr) is its complementary function, ρm stands for probability that smr=0. Thus, 1−ρm stands for probability that smr>0. ƒ(smr) is continuous probability density function that models sparse probability distribution of the amplitude smr. representing [II] by using truncated Taylor expansion:X=GsS+Gs2[{sm1sm2}m1,m2=1M]+HOT[IV] where {sm1sm2}m1,m2=1M stand for second order monomials that are cross-products between pure components {sm}m=1M, Gs and Gs2, are matrices of appropriate dimensions and HOT stands for higher-order terms that include monomials of order greater than 2, apply robust principal component analysis to X in [IV] to obtain:
X=A+E [V] whereA≈GsS+Gs2[{sm1sm2}m1,m2=1M]stands for low-rank matrix composed of linear combination of original pure components and linear combination of second order monomials that represent new components correlated with the original ones, and E≈HOT stands for sparse matrix that represents error terms associated with higher-order monomials,
apply hard threshodling operator to X in [IV] to obtain:B≈GsS+Gs2[{sm1sm2}m1,m2=1M][VI] where B stands for hard thresholded version of X in [IV], applying soft thresholding operator to X in [IV] to obtain:C≈GsS+Gs2[{sm1sm2}m1,m2=1M][VII] where C stands for soft thresholded version of X in [IV], applying trimmed thresholding operator to X in [IV] to obtain:D≈GsS+Gs2[{sm1sm2}m1,m2=1M][VIII] where D stands for trimmed thresholded version of X in [IV], using empirical kernel map for nonlinear mapping of A in [V] onto reproducible kernel Hilbert space:Ψ(A)=[κ(a1,v1)…κ(aR,v1)………κ(a1,vD)…κ(aR,vD)][IX] where κ(ar,vd), r=1, . . . , R and d=1, . . . , D stands for positive symmetric kernel function and vd, d=1, . . . , D stand for basis vectors that approximately span the same space as the vectors: ar, r=1=, . . . , R. using empirical kernel map for nonlinear mapping of B in [VI] onto reproducible kernel Hilbert space:Ψ(B)=[κ(b1,v1)…κ(bR,v1)………κ(b1,vD)…κ(bR,vD)][X] where interpretation of Ψ(B) is equivalent to those of Ψ(A) in [IX], using empirical kernel map for nonlinear mapping of C in [VII] onto reproducible kernel Hilbert space:Ψ(C)=[κ(c1,v1)…κ(cR,v1)………κ(c1,vD)…κ(cR,vD)][XI] where interpretation of Ψ(C) is equivalent to those of Ψ(A) in [IX], using empirical kernel map for nonlinear mapping of D in [VIII] onto reproducible kernel Hilbert space:Ψ(D)=[κ(d1,v1)…κ(dR,v1)………κ(d1,vD)…κ(dR,vD)][XII] where interpretation of Ψ(D) is equivalent to those of Ψ(A) in [IX], applying sparseness and nonnegativity constrained matrix factorization (sNMF) algorithms to [IX], [X], [XI] and [XII] to obtain estimates of the pure components {sm}m=1M and some of their cross-products {sm1sm2}m1m2=1M:
{smA}m=1{circumflex over (M)}=sNMF(Ψ(A)) [XIII]{smB}m=1{circumflex over (M)}=sNMF(Ψ(B)) [XIV]{smC}m=1{circumflex over (M)}=sNMF(Ψ(C)) [XV]{smD}m=1{circumflex over (M)}=sNMF(Ψ(D)) [XVI] where {circumflex over (M)} denotes overall number of components separated in [XIII], [XIV], [XV] and [XVI], estimating further the pure components by correlating { smA}m=1{circumflex over (M)} from [XIII], { smB}m=1{circumflex over (M)} from [XIV], { smC}m=1{circumflex over (M)} from [XV] and { smD}m=1M from [XVI], with the components stored in the library composed of J reference compounds {sjref}j=1J:cmjA=argmaxj=1,…,J〈s_mA,sjref〉s_mAsjrefm=1,…,M^,[XVII]cmjB=argmaxj=1,…,J〈s_mB,sjref〉s_mBsjrefm=1,…,M^,[XVIII]cmjC=argmaxj=1,…,J〈s_mC,sjref〉s_mCsjrefm=1,…,M^,[XIX]cmjD=argmaxj=1,…,J〈s_mD,sjref〉s_mDsjrefm=1,…,M^,[XX] where sm,sjref, smB,sjref, smC,sjref and smD,sjref denote the inner products respectively between smA, smB, smC, smD and sjref. ∥ smA∥, ∥ smB∥, ∥ smC∥, ∥ smD∥ and ∥sjref∥ denote, respectively, l2-norm of smA, smB, smC, smD and sjref. assigning to each component in the library {sjref}j=1J components separated from [XIII], [XIV], [XV] and [XVI] that are indexed according to:[cA,mA*]=argmaxm{cmjA}m=1Aj[XXI][cB,mB*]=argmaxm{cmjB}m=1Bj[XXII][cC,mC*]=argmaxm{cmjC}m=1Cj[XXIII][cD,mD*]=argmaxm{cmjD}m=1Dj[XXIV] where Aj, Bj, Cj and Dj respectively stand for number of separated components { smA}m=1{circumflex over (M)}, { smB}m=1{circumflex over (M)}, { smC}m=1{circumflex over (M)} and { smD}m=1{circumflex over (M)} associated respectively in [XVII], [XVIII], [XIX] and [XX] to reference component sjref. obtaining final estimates of the candidates for pure components {ŝj}j=1J according:I=argmaxA,B,C,D(cA,cB,cC,cD)s^j=s_mI*j=1,…,JandI∈{A,B,C,D}.[XXV] separated components { smA}m=1{circumflex over (M)}, { smB}m=1{circumflex over (M)}, { smC}m=1{circumflex over (M)} and { smD}m=1{circumflex over (M)} that are not assigned to the pure components from the library {ŝj}j=1J, are considered as candidates for new pure components. presenting estimated candidates of pure components {ŝj}j=1J and candidates for new pure components from [XXV]. |