发明名称 MULTISEQUENCE DATA REPRESENTATION
摘要 Genetic sequence data occurring in genome sequences is represented for efficient access of the sequence information in a defined storage scheme. A described replet-sequence matrix data structure allows the compression and efficient access of sequence information. The data structure allows the dynamic change of ontology: the replet-information table can evolve by adding, updating, removing replets, and the set of replets present in the table represent the ontology at the moment. The data structure enables the sequence information to be processed in parallel, and also enables multiple views of the sequence data to exist along with replet specific information.
申请公布号 US2016012080(A1) 申请公布日期 2016.01.14
申请号 US201514791855 申请日期 2015.07.06
申请人 International Business Machines Corporation 发明人 Hussan Jagir R
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer system-implemented method for storing and presenting sequence data, comprising: determining, by a computer system, for a genome-related sequence, whether specified replets have matching subsequences of the sequence; generating and storing in a non-transitory computer readable storage medium, by the computer system, a match-set data structure having respective entries for ones of the replets having matching subsequences, each entry comprising a first and second position parameter, the first position parameter of each match-set data structure entry denoting a location in the sequence and the second position parameter of each match-set data structure entry denoting an offset from the location; forming and storing in a non-transitory computer readable storage medium, by the computer system, a backbone sequence from unmatched regions of the sequence; and updating, by the computer system, the first and second position parameters of the entries in the match-set data structure for a selected at least first one of the replets that has a matching subsequence, wherein the selected at least first one of the replets has a position within the sequence and wherein the updating is responsive to the position of the selected at least first one of the replets, the updating being performed to make match-set entries valid for non-chosen replets.
地址 Armonk NY US