发明名称 LOCAL GENETIC ETHNICITY DETERMINATION SYSTEM
摘要 An input sample SNP genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid hidden Markov Model (HMM) is built and from a haplotype Markov Model (MM). The diploid HMM for a window is used to determine the probability that the window corresponds to a pair of labels (e.g., ethnicity labels). An inter-window HMM, with a set of states for each window, is built based on the diploid HMMs for each window. Labels are assigned to the input sample genotype based on the inter-window HMM.
申请公布号 US2017017752(A1) 申请公布日期 2017.01.19
申请号 US201615209458 申请日期 2016.07.13
申请人 Ancestry.com DNA, LLC 发明人 Noto Keith D.;Wang Yong
分类号 G06F19/22;G06N7/00 主分类号 G06F19/22
代理机构 代理人
主权项 1. A computer-implemented method for assigning, to an input sample genotype, one or more labels from a set of labels, the method comprising: accessing an input sample genotype; dividing the input sample genotype into a plurality of windows of sequential single nucleotide polymorphisms (SNPs); building, for each window, a diploid hidden Markov model (HMM) based on the input sample genotype, wherein each diploid state in the diploid HMM corresponds to a pair of haploid states from a haploid Markov model (MM) for the window; calculating, for each diploid state in each diploid HMM, a diploid state probability indicating the likelihood that the input sample genotype corresponds to the diploid state; accessing, for each window, a set of annotations, each annotation corresponding to a haploid state from the haploid MM for the window and to a label of the set of labels, wherein the annotation indicates the probability that a haplotype having the label corresponds to the haploid state; calculating, for each window, a label pair probability distribution based on the annotations for the window and the diploid state probabilities for the diploid HMM of the window; building an inter-window HMM, the inter-window HMM including a plurality of states that each correspond to a pair of labels and a window, wherein the inter-window HMM is based on the label pair probability distribution for each window; assigning the one or more labels to the input sample genotype based on the inter-window HMM.
地址 Provo UT US