摘要 |
An input sample SNP genotype is divided into a plurality of windows, each including a sequence of SNPs. For each window, a diploid hidden Markov Model (HMM) is built and from a haplotype Markov Model (MM). The diploid HMM for a window is used to determine the probability that the window corresponds to a pair of labels (e.g., ethnicity labels). An inter-window HMM, with a set of states for each window, is built based on the diploid HMMs for each window. Labels are assigned to the input sample genotype based on the inter-window HMM. |
主权项 |
1. A computer-implemented method for assigning, to an input sample genotype, one or more labels from a set of labels, the method comprising:
accessing an input sample genotype; dividing the input sample genotype into a plurality of windows of sequential single nucleotide polymorphisms (SNPs); building, for each window, a diploid hidden Markov model (HMM) based on the input sample genotype, wherein each diploid state in the diploid HMM corresponds to a pair of haploid states from a haploid Markov model (MM) for the window; calculating, for each diploid state in each diploid HMM, a diploid state probability indicating the likelihood that the input sample genotype corresponds to the diploid state; accessing, for each window, a set of annotations, each annotation corresponding to a haploid state from the haploid MM for the window and to a label of the set of labels, wherein the annotation indicates the probability that a haplotype having the label corresponds to the haploid state; calculating, for each window, a label pair probability distribution based on the annotations for the window and the diploid state probabilities for the diploid HMM of the window; building an inter-window HMM, the inter-window HMM including a plurality of states that each correspond to a pair of labels and a window, wherein the inter-window HMM is based on the label pair probability distribution for each window; assigning the one or more labels to the input sample genotype based on the inter-window HMM. |