发明名称 SYSTEMS AND METHODS FOR GENOMIC VARIANT ANALYSIS
摘要 A genomic variant analysis method and computer system utilizing information related to variant frequency and biological consequence to determine the relative statistical significance of each variant in given genome sequence datasets. The method and system perform both variant frequency normalization and universal pairwise variant comparisons across the given genome sequence datasets to automatically identify the likelihood of any given variant as contributing to disease process or biological phenomenon under study and organize the results into a priority ranking. The priority ranking is then used to categorize the results into biologically-related data subsets for display to indicate potential for importance.
申请公布号 US2015193578(A1) 申请公布日期 2015.07.09
申请号 US201514590427 申请日期 2015.01.06
申请人 THE REGENTS OF THE UNIVERSITY OF MICHIGAN 发明人 Kiel Mark J.;Elenitoba-Johnson Kojo;Lim Megan
分类号 G06F19/22 主分类号 G06F19/22
代理机构 代理人
主权项 1. A computer-implemented method for automatically identifying and prioritizing genomic variants, the method comprising: receiving, via one or more processors executing a processor-implemented instruction module, one or more genome sequence datasets comprising genomic variant information, the one or more genome sequence datasets including an experimental dataset and up to one or more control datasets; determining, via the processor-implemented instruction module, a frequency-score for each genomic variant in the experimental dataset based on the frequency at which each genomic variant in the experimental dataset appears in the experimental dataset and the up to one or more control datasets; performing, via the processor-implemented instruction module, pairwise comparisons between each genomic variant in the experimental dataset; determining, via the processor-implemented instruction module, a relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset; determining, via the processor-implemented instruction module, a frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset based on the frequency-score for each genomic variant in the experimental dataset; determining, via the processor-implemented instruction module, a control-frequency-score for each genomic variant in the up to one or more control datasets based on the frequency at which each genomic variant in the up to one or more control datasets appears in the up to one or more control datasets and the experimental dataset; performing, via the processor-implemented instruction module, pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a control-relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a control-frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets based on the frequency-score for each genomic variant in the experimental dataset and the control-frequency-score for each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a control-frequency-adjusted relatedness-score for each genomic variant in the experimental dataset based on the control-frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and each genomic variant in the up to one or more control datasets; determining, via the processor-implemented instruction module, a normalized frequency-corrected relatedness-score for each of the pairwise comparisons between each variant in the experimental dataset based on the frequency-corrected relatedness-score for each of the pairwise comparisons between each genomic variant in the experimental dataset and the control-frequency-adjusted relatedness-score for each genomic variant in the experimental dataset; and determining, via the processor-implemented instruction module, a priority-score for each genomic variant in the experimental dataset based on the normalized frequency-corrected relatedness-score for each of the pairwise comparisons between each variant in the experimental dataset.
地址 Ann Arbor MI US