发明名称 DISCOVERY SYSTEMS FOR IDENTIFYING ENTITIES THAT HAVE A TARGET PROPERTY
摘要 Systems and methods for assaying a test entity for a property, without measuring the property, are provided. Exemplary test entities include proteins, protein mixtures, and protein fragments. Measurements of first features in a respective subset of an N-dimensional space and of second features in a respective subset of an M-dimensional space, is obtained as training data for each reference in a plurality of reference entities. One or more of the second features is a metric for the target property. A subset of first features, or combinations thereof, is identified using feature selection. A model is trained on the subset of first features using the training data. Measurement values for the subset of first features for the test entity are applied to thereby obtaining a model value that is compared to model values obtained using measured values of the subset of first features from reference entities exhibiting the property.
申请公布号 US2017091637(A1) 申请公布日期 2017.03.30
申请号 US201615282052 申请日期 2016.09.30
申请人 Hampton Creek, Inc. 发明人 Chae Lee;Tetrick Josh Stephen;Xu Meng;Schultz Matthew D.;Wang Chuan;Tilmans Nicolas;Brzustowicz Michael
分类号 G06N5/04;G06N99/00 主分类号 G06N5/04
代理机构 代理人
主权项 1. A discovery system for inferentially screening a test entity to determine whether it exhibits a target property without directly measuring the test entity for the target property, the discovery system comprising: at least one processor and memory addressable by the at least one processor, the memory storing at least one program for execution by the at least one processor, the at least one program comprising instructions for: A) obtaining a training set that comprises a plurality of reference entities and, for each respective reference entity, (i) a respective measurement of each first feature in a respective subset of first features in an N-dimensional feature space and (ii) a respective measurement of each second feature in a respective subset of an M-dimensional feature space, wherein N is a positive integer of two or greater,M is a positive integer,the training set collectively provides at least one measurement for each first feature in the N-dimensional feature space,the training set collectively provides at least one measurement for each second feature in the M-dimensional feature space,at least one second feature in the M-dimensional feature space is a metric for the target property,the N-dimensional feature space does not include any of the second features in the M-dimensional space,the M-dimensional feature space does not include any of the first features in the N-dimensional space, andthe test entity comprises a protein, a fragment thereof, or a mixture of the protein with one or more other proteins; B) identifying two or more first features, or one or more combinations thereof, in the N-dimensional feature space using a feature selection method and the training set, thereby selecting a set of first features {p1, . . . , pN−K} from the N-dimensional feature space, wherein N−K is a positive integer less than N; C) training a model using measurements for the set of first features {p1, . . . , pN−K} across the training set, thereby obtaining a trained model; D) obtaining measurement values for the set of first features {p1, . . . , pN−K} of the test entity; E) inputting the set of first features {p1, . . . , pN−K} of the test entity into the trained model thereby obtaining a trained model output value for the test entity; and F) comparing the trained model output value of the test entity to one or more trained model output values computed using measurement values for the set of first features {p1, . . . , pN−K} of one or more reference entities that exhibits the target property thereby determining whether the test entity exhibits the target property.
地址 San Francisco CA US