发明名称 METHOD AND SYSTEM OF MAPPING SEQUENCING READS
摘要 A method and a parallel-computing system of mapping sequencing reads is provided. The method preprocesses a reference genome to construct a compression structure of the reference genome, an index array and a block address array; the index array stores the index values of all sorted subsequences on the reference genome; the block address array stores the positions of a portion of the elements in the index array; the parameters involved in the mapping method are selected based on the statistical characteristics of the reference genome, the statistical quality information of sequencing reads and the polymorphism rates of the target species from which the sequencing reads are generated. Based on the structures constructed in the preprocessing stage, each sequencing read is mapped to the reference genome by anchoring on the genome by a certain single perfect match prefix seed, alignment extension based on the auto-match function method, and statistical assessment.
申请公布号 US2016259886(A1) 申请公布日期 2016.09.08
申请号 US201414901645 申请日期 2014.06.25
申请人 ACADEMY OF MATHEMATICS AND SYSTEM SCIENCE, CHINESE ACADEMY OF SCIENCES 发明人 Li Lei;Wang Anqi;Chen Shijian
分类号 G06F19/22;G06F17/30 主分类号 G06F19/22
代理机构 代理人
主权项 1. A method of mapping sequencing reads, which performs operations for an obtained reference genome and at least one sequencing read, wherein the reference genome is a genome sequence whose sequencing has been completed, and the operations comprise the following steps: Step 1: performing preprocessing for the said reference genome to generate a reference genome compression structure, an index array and a block address array; wherein the said reference genome compression structure stores the whole reference genome in a compressive way; the said index array stores the index values of all sorted n-mers on the said reference genome; and the said block address array is configured to store the positions of a portion of elements in the index array, aiming at accelerating the anchoring of a sequencing read on the reference genome; Step 2: based on characteristics of the reference genome, information of all the sequencing reads as a whole, genetic differences between the reference and the species from which the sequencing reads are generated, designing parameters of a mapping algorithm according to a probability calculation to meet or to compromise the requirements on sensitivity, specificity and mapping speed; Step 3: based on the above said reference genome compression structure, index array and block address array obtained from the preprocessing, mapping each sequencing read to the said reference genome through the steps of seed anchoring, extension based on an auto-match function method and statistical analysis; Step 4: outputting the mapping information of each sequencing read.
地址 Beijing CN