主权项 |
1. A computer-implemented method for automatically identifying and prioritizing genomic variants, the method comprising:
receiving, via one or more processors executing a processor-implemented instruction module, one or more genome sequence datasets comprising genomic variant information, the one or more genome sequence datasets including an experimental dataset and up to one or more control datasets; determining, via the processor-implemented instruction module, a frequency-score for each genomic variant in the experimental dataset based on the frequency at which each genomic variant in the experimental dataset appears in the experimental dataset and the up to one or more control datasets; performing, via the processor-implemented instruction module, pairwise comparisons between each genomic variant in the experimental dataset; determining, via the processor-implemented instruction module, a relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset; determining, via the processor-implemented instruction module, a frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset based on the frequency-score for each genomic variant in the experimental dataset; determining, via the processor-implemented instruction module, a control-frequency-score for each genomic variant in the up to one or more control datasets based on the frequency at which each genomic variant in the up to one or more control datasets appears in the up to one or more control datasets and the experimental dataset; performing, via the processor-implemented instruction module, pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a control-relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a control-frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets based on the frequency-score for each genomic variant in the experimental dataset and the control-frequency-score for each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a control-frequency-adjusted relatedness-score for each genomic variant in the experimental dataset based on the control-frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a normalized frequency-corrected relatedness-score for each of the pairwise comparisons between each variant in the experimental dataset based on the frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and the control-frequency-adjusted relatedness-score for each genomic variant in the experimental dataset; and determining, via the processor-implemented instruction module, a priority-score for each genomic variant in the experimental dataset based on the normalized frequency-corrected relatedness-score for each of the pairwise comparisons between each variant in the experimental dataset. |