发明名称 Sequence database search with sequence search trees
摘要 A method and system for generating and searching a tree-structured index of window vectors that represent database sequences comprise a window vector generation module, a tree-structured index generation module, a query sequence partitioning module, and a retrieval component. The window vector generation module partitions a database sequence into a plurality of overlapping windows. Each window has a fixed length W comprising a fixed number of nucleotides, and the offset among windows is determined by a parameter Delta. The window vector generation module then maps each database sequence window into a window vector. The database sequence window vector indicates the frequency of appearance of each k-tuple in the corresponding database sequence window. The tree-structured index generation module then generates a tree-structured index using the database sequence window vectors. The query sequence partitioning module partitions a query sequence into a plurality of windows and maps each query sequence window into a query sequence window vector. Each query sequence window vector is then compared against the tree-structured index to locate the database sequences that are similar to the query sequence. The list of database sequences that are similar to the query sequence is then returned as the result of the search.</PTEXT>
申请公布号 US6633817(B1) 申请公布日期 2003.10.14
申请号 US19990474929 申请日期 1999.12.29
申请人 INCYTE GENOMICS, INC. 发明人 WALKER MICHAEL G.;WANG JAMES Z.;GILADI ELDAR Y.
分类号 G06F17/30;G06F19/00;(IPC1-7):G01N33/48;G01V3/00;G01G23/01 主分类号 G06F17/30
代理机构 代理人
主权项
地址