发明名称 Compact next generation sequencing dataset and efficient sequence processing using same
摘要 In a method operative on a genetic sequencing read comprising a base sequence acquired by processing a tissue sample, a compact text representation of the genetic sequencing read is generated. The compact text representation includes (1) a text string representing the base sequence and (2) a base quality text field identifying the longest sub-sequence of the base sequence for which base quality scores of the bases of the sub sequence satisfy a base quality score threshold; and storing the compact text representation of the genetic sequencing read in a raw reads storage. To provide flexibility, the base quality text field may identify the longest sub-sequence for each of two or more different base quality score thresholds. During reads alignment, offset boundaries for the genetic sequencing reads can be efficiently chosen using the content of the base quality text field.
申请公布号 EP2634717(A2) 申请公布日期 2013.09.04
申请号 EP20120160812 申请日期 2012.03.22
申请人 KONINKLIJKE PHILIPS ELECTRONICS N.V. 发明人
分类号 G06F19/22;C12Q1/68 主分类号 G06F19/22
代理机构 代理人
主权项
地址