摘要 |
Short fixed length source sub-sequences are extracted from a collection of source sequences derived from a sample for which the biological signature is to be determined. The extracted short fixed length source sub-sequences are compiled to determine the frequency of each within the collection. Overlaps between the short fixed length source sub-sequences are used to find a chain of overlaps from one or more sub-sequences equivalent to a pre-flanking reference marker sequence to one or more sub-sequences equivalent to a post-flanking reference marker sequence, wherein the reference marker sequences flank a region containing a repetitive sequence region. In response to the chain containing multiple instances of the one or more short fixed length source sub-sequences, thereby defining a cycle, the sequences from the collection derived from the sample are examined to find one or more sequences that span the cycle, and at least one of: (i) the lengths of the spanning sequences are used to determine the length of the cycle and; (ii) the number of repeat motif copies within each spanning sequence are counted. |