发明名称 METHOD AND SYSTEM FOR DETERMINING COPY NUMBER VARIATION
摘要 Disclosed are a method and a system for determining genome copy number variation, which relates to the technical field of bioinformatics. The method comprises obtaining reads; determining sequence labels according to the reads; counting the number of sequence labels falling into each window; performing GC correction on the sequence label number of each window and a correction according to an expected sequence label number adjusted by a control set to obtain a corrected sequence label number; selecting a demarcation point with a small significance value as a candidate CNV breaking point; rejecting the least significant candidate CNV breaking point at every turn, updating difference significance values of two candidate CNV breaking points on the left and right of the rejected candidate CNV breaking point and performing cyclic iteration until difference significance values of all candidate CNV breaking points are smaller than a termination threshold value, thereby determining a CNV breaking point. The method and the system the present invention have clinical feasibility, and can precisely detect a micro-deletion/micro-duplication area of 0.5 M under the situation of using data of about 50 M.
申请公布号 US2015056619(A1) 申请公布日期 2015.02.26
申请号 US201214389898 申请日期 2012.04.05
申请人 Li Xuchao;Chen Shengpei;Chen Fang;Xie Weiwei;Wang Jian;Wang Jun;Yang Huanming;Zhang Xiuqing 发明人 Li Xuchao;Chen Shengpei;Chen Fang;Xie Weiwei;Wang Jian;Wang Jun;Yang Huanming;Zhang Xiuqing
分类号 C12Q1/68 主分类号 C12Q1/68
代理机构 代理人
主权项 1. A method of detecting a copy number variation comprising following steps: obtaining reads from at least one part of a nucleic acid molecule of a sample, determining uniquely-mapped reads aligned to a (genomic) reference sequence based on the obtained reads, dividing the genomic reference sequence into a plurality of windows, and calculating the number of uniquely-mapped reads falling into each of the plurality of windows, subjecting the number of uniquely-mapped reads falling into each of the plurality of windows to a GC correction, and to a correction based on an expected number of uniquely-mapped reads adjusted by a control set to obtain a corrected number of uniquely-mapped reads, calculating a significance value of the difference between two numerical populations each consisting of the corrected numbers of uniquely-mapped reads falling into windows on each of the two sides of a demarcation point, the demarcation point being a starting point or an ending point of each of the plurality of windows, to thereby select the demarcation point having a smaller significance value as a candidate CNV breakpoint; calculating a significance value of the difference between two numerical populations each consisting of the corrected number of uniquely-mapped reads falling into windows contained within each of two sequences, with one sequence ranging from a given candidate CNV breakpoint to an adjacent upstream candidate CNV breakpoint, and the other sequence ranging from the given candidate CNV breakpoint to an adjacent downstream candidate CNV breakpoint, and removing the candidate CNV breakpoint having the least significance at every turn and recalculating the significance value for the two candidate CNV breakpoints adjacent to the removed candidate CNV breakpoint, performing cyclic iteration until the significance values of all candidate CNV breakpoints are less than a termination threshold value, to thereby determine the CNV breakpoint.
地址 Guangdong CN