发明名称 Binary prediction tree modeling with many predictors and its uses in clinical and genomic applications
摘要 The statistical analysis described and claimed is a predictive statistical tree model that overcomes several problems observed in prior statistical models and regression analyses, while ensuring greater accuracy and predictive capabilities. Although the claimed use of the predictive statistical tree model described herein is directed to the prediction of a disease in individuals, the claimed model can be used for a variety of applications including the prediction of disease states, susceptibility of disease states or any other biological state of interest, as well as other applicable non-biological states of interest. This model first screens genes to reduce noise, applies k-means correlation-based clustering targeting a large number of clusters, and then uses singular value decompositions (SVD) to extract the single dominant factor (principal component) from each cluster. This generates a statistically significant number of cluster-derived singular factors, that we refer to as metagenes, that characterize multiple patterns of expression of the genes across samples. The strategy aims to extract multiple such patterns while reducing dimension and smoothing out gene-specific noise through the aggregation within clusters. Formal predictive analysis then uses these metagenes in a Bayesian classification tree analysis. This generates multiple recursive partitions of the sample into subgroups (the "leaves" of the classification tree), and associates Bayesian predictive probabilities of outcomes with each subgroup. Overall predictions for an individual sample are then generated by averaging predictions, with appropriate weights, across many such tree models. The model includes the use of iterative out-of-sample, cross-validation predictions leaving each sample out of the data set one at a time, refitting the model from the remaining samples and using it to predict the hold-out case. This rigorously tests the predictive value of a model and mirrors the real-world prognostic context where prediction of new cases as they arise is the major goal.
申请公布号 US2005170528(A1) 申请公布日期 2005.08.04
申请号 US20030692002 申请日期 2003.10.24
申请人 WEST MIKE;NEVINS JOSEPH R. 发明人 WEST MIKE;NEVINS JOSEPH R.
分类号 G01N;G01N33/48;G01N33/50;G01N33/543;G06F19/00;G06G7/48;G06N3/00;G06N5/00;G06N7/00;(IPC1-7):G06F19/00 主分类号 G01N
代理机构 代理人
主权项
地址