发明名称 HIERARCHICAL GENOME ASSEMBLY METHOD USING SINGLE LONG INSERT LIBRARY
摘要 The present invention is generally directed to a hierarchical genome assembly process for producing high-quality de novo genome assemblies. The method utilizes a single, long-insert, shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT®) DNA sequencing, and obviates the need for additional sample preparation and sequencing data sets required for previously described hybrid assembly strategies. Efficient de novo assembly from genomic DNA to a finished genome sequence is demonstrated for several microorganisms using as little as three SMRT® cells, and for bacterial artificial chromosomes (BACs) using sequencing data from just one SMRT® Cell. Part of this new assembly workflow is a new consensus algorithm which takes advantage of SMRT® sequencing primary quality values, to produce a highly accurate de novo genome sequence, exceeding 99.999% (QV 50) accuracy. The methods are typically performed on a computer and comprise an algorithm that constructs sequence alignment graphs from pairwise alignment of sequence reads to a common reference.
申请公布号 US2015302144(A1) 申请公布日期 2015.10.22
申请号 US201514716617 申请日期 2015.05.19
申请人 Pacific Biosciences of California, Inc. 发明人 Chin Chen-Shan;Turner Stephen
分类号 G06F19/22;G06F19/18 主分类号 G06F19/22
代理机构 代理人
主权项 1. A computer-implemented method to determine a consensus sequence from a set of polynucleotide sequence reads without using a previously known reference sequence, the method comprising: a) providing a set of polynucleotide sequence reads that comprise errors introduced by a sequencing reaction, wherein said polynucleotide sequence reads in the set comprise overlapping polynucleotide sequences that are alignable to each other; b) choosing a seed read from the set of polynucleotide sequence reads; c) performing pairwise alignment of all other polynucleotide sequence reads in the set to the seed read to generate a set of sequence alignments; d) constructing a multiple sequence alignment from the set of sequence alignments, wherein the errors in the set of polynucleotide sequence reads are present in the resulting multiple sequence alignment and further wherein the multiple sequence alignment is constructed without the use of a previously known reference sequence; e) performing an error correction step on the multiple sequence alignment by applying a consensus algorithm to the multiple sequence alignment, wherein the consensus algorithm reduces the number of errors in the seed sequence using information within the multiple sequence alignment and generates a consensus sequence for the set of polynucleotide sequence reads, thereby determine a consensus sequence from a set of polynucleotide sequence reads.
地址 Menlo Park CA US