摘要 |
In a method operative on a genetic sequencing read comprising a base sequence acquired by processing a tissue sample, a compact text representation of the genetic sequencing read is generated. The compact text representation includes (1) a text string representing the base sequence and (2) a base quality text field identifying the longest sub-sequence of the base sequence for which base quality scores of the bases of the sub sequence satisfy a base quality score threshold; and storing the compact text representation of the genetic sequencing read in a raw reads storage. To provide flexibility, the base quality text field may identify the longest sub-sequence for each of two or more different base quality score thresholds. During reads alignment, offset boundaries for the genetic sequencing reads can be efficiently chosen using the content of the base quality text field. |