发明名称 BIOLOGICAL SEQUENCE TANDEM REPEAT CHARACTERIZATION
摘要 Short fixed length source sub-sequences are extracted from a collection of source sequences derived from a sample for which the biological signature is to be determined. The extracted short fixed length source sub-sequences are compiled to determine the frequency of each within the collection. Overlaps between the short fixed length source sub-sequences are used to find a chain of overlaps from one or more sub-sequences equivalent to a pre-flanking reference marker sequence to one or more sub-sequences equivalent to a post-flanking reference marker sequence, wherein the reference marker sequences flank a region containing a repetitive sequence region. In response to the chain containing multiple instances of the one or more short fixed length source sub-sequences, thereby defining a cycle, the sequences from the collection derived from the sample are examined to find one or more sequences that span the cycle, and at least one of: (i) the lengths of the spanning sequences are used to determine the length of the cycle and; (ii) the number of repeat motif copies within each spanning sequence are counted.
申请公布号 US2016103953(A1) 申请公布日期 2016.04.14
申请号 US201414511702 申请日期 2014.10.10
申请人 International Business Machines Corporation 发明人 Conway Thomas C.;Wyres Kelly L.
分类号 G06F19/22 主分类号 G06F19/22
代理机构 代理人
主权项
地址 Armonk NY US