发明名称 BIOLOGICAL SEQUENCE VARIANT CHARACTERIZATION
摘要 Short fixed length sub-sequences, defined as reference sub-sequences, are extracted from a collection of reference sequences, and an index is constructed showing which short fixed length reference sub-sequence occurs in which reference sequences. Short fixed length sub-sequences, the same length as the reference sub-sequences and defined as source sub-sequences, are extracted from a collection of source sequences derived from a sample for which the signature is to be determined, and the short fixed length source sub-sequences are compiled to determine the frequency of each within the collection. The presence or absence of source sub-sequences in combination with the index is used to infer the presence or absence of reference sequences from the reference collection.
申请公布号 US2016103956(A1) 申请公布日期 2016.04.14
申请号 US201514793379 申请日期 2015.07.07
申请人 International Business Machines Corporation 发明人 Conway Thomas C.;Wyres Kelly L.
分类号 G06F19/22 主分类号 G06F19/22
代理机构 代理人
主权项 1. A method for determining a signature from biological sequence data, comprising: extracting short fixed length sub-sequences, defined as reference sub-sequences, from a collection of reference sequences, and constructing an index showing which short fixed length reference sub-sequence occurs in which reference sequences; extracting short fixed length sub-sequences, the same length as the reference sub-sequences and defined as source sub-sequences, from a collection of source sequences derived from a sample for which the signature is to be determined, and compiling the short fixed length source sub-sequences to determine the frequency of each within the collection; and using the presence or absence of source sub-sequences in combination with the index to infer the presence or absence of reference sequences from the reference collection; wherein one or more of the above steps are performed in accordance with a processor and a memory.
地址 Armonk NY US