发明名称 METHODS AND SYSTEMS FOR ALIGNING SEQUENCES IN THE PRESENCE OF REPEATING ELEMENTS
摘要 The invention includes methods for aligning reads (e.g., nucleic acid reads) comprising repeating sequences, methods for building reference sequence constructs comprising repeating sequences, and systems that can be used to align reads comprising repeating sequences. The method is scalable, and can be used to align millions of reads to a construct thousands of bases long. The methods and systems can additionally account for variability within a repeating sequence, or near to a repeating sequence, due to genetic mutation.
申请公布号 US2015199474(A1) 申请公布日期 2015.07.16
申请号 US201414517513 申请日期 2014.10.17
申请人 Seven Bridges Genomics Inc. 发明人 Kural Deniz
分类号 G06F19/22 主分类号 G06F19/22
代理机构 代理人
主权项 1. A system for aligning paired nucleic acid reads in a genetic sample having repetitive sequences, the system comprising a processor and memory, wherein the memory comprises instructions that, when executed, cause the processor to: obtain first and second nucleic acid reads known to be found within a predetermined distance in a sample, wherein the first or second read comprises at least a portion of a repetitive sequence, the repetitive sequence being at least 10 bp in length and repeated at least about 100 times, with 90% or greater identity, within a genome from which the sample is obtained; score sequence overlaps for the first and second reads against a reference sequence construct, the construct comprising at least two alternative sequences per position at multiple positions in the construct and including a plurality of the repetitive sequences, wherein greater overlap results in a higher score; and assign the first and second reads to a location in the construct such that the score for each read is maximized.
地址 Cambridge MA US