发明名称 Systems and methods for using paired-end data in directed acyclic structure
摘要 Methods of analyzing a transcriptome that involves obtaining at least one pair of paired-end reads from a transcriptome from an organism, finding an alignment with an optimal score between a first read of the pair and a node in a directed acyclic data structure (the data structure has nodes representing RNA sequences such as exons or transcripts and edges connecting pairs of nodes), identifying candidate paths that include the node connected to a downstream node by a path having a length substantially similar to an insert length of the pair of paired-end reads, and aligning the paired-end rends to the candidate paths to determine an optimal-scoring alignment.
申请公布号 US9092402(B2) 申请公布日期 2015.07.28
申请号 US201414157979 申请日期 2014.01.17
申请人 Seven Bridges Genomics Inc. 发明人 Kural Deniz;Meyvis Nathan
分类号 G06F19/10;G06F19/22;G06F19/28 主分类号 G06F19/10
代理机构 Brown Rudnick LLP 代理人 Meyers Thomas C.;Brown Rudnick LLP
主权项 1. A method of analyzing a transcriptome, the method comprising: obtaining from an annotated transcriptome database—using a computer comprising a processor coupled to the memory—a plurality of exons and introns from a genome; using the processor to transform the plurality of exons and introns into a directed acyclic data structure comprising nodes representing known RNA sequences and edges connecting the nodes; obtaining a pair of paired-end reads generated by sequencing a transcriptome of an organism; using the processor to transform the first read of the pair into an alignment with an optimal score between that first read of the pair and a node in the directed acyclic data structure; identifying, using the processor, candidate paths within the directed acyclic data structure that include the node connected to a downstream node by a path having a length substantially similar to an insert length of the pair of paired-end reads; excluding non-candidate paths from alignments involving the pair of paired-end reads; aligning, using the processor, the paired-end reads to the candidate paths to determine an optimal-scoring alignment by: calculating match scores between a second read of the pair and nodes in the candidate paths, andlooking backwards to predecessor nodes in the candidate paths while not considering any nodes in the non-candidate paths to identify a back-trace through the candidate paths that gives an optimal score, wherein the back-trace that gives the optimal score corresponds to an optimal scoring alignment of the pair of paired-end reads to the candidate paths, and wherein the directed acyclic data structure held in the memory prior to obtaining the pair of paired-end reads includes at least one path that has a node that the second read of the pair aligns to but that is not included during the aligning step due to being excluded as a non-candidate path; and identifying an isoform of an RNA from the organism using the optimal scoring alignment of the paired-end reads.
地址 Cambridge MA US