发明名称 EFFICIENT GENOMIC READ ALIGNMENT IN AN IN-MEMORY DATABASE
摘要 A high performance, low-cost, gapped read alignment algorithm is disclosed that produces high quality alignments of a complete human genome in a few minutes. Additionally, the algorithm is more than an order of magnitude faster than previous approaches using a low-cost workstation. The results are obtained via careful algorithm engineering of the seeding based approach. The use of non-hashed seeds in combination with techniques from search engine ranking achieves fast cache-efficient processing. The algorithm can also be efficiently parallelized. Integration into an in-memory database infrastructure (IMDB) leads to low overhead for data management and further analysis.
申请公布号 US2014214334(A1) 申请公布日期 2014.07.31
申请号 US201414165123 申请日期 2014.01.27
申请人 Hasso-Plattner-Institut fuer Softwaresystemtechnik GmbH 发明人 Plattner Hasso;Schapranow Matthieu-Patrick;Ziegler Emanuel
分类号 G06F19/22 主分类号 G06F19/22
代理机构 代理人
主权项 1. A computer-based system for processing nucleotide sequence data, which are provided as reads, wherein the system has an interface for importing the nucleotide sequence data from a sequencer machine (M), comprising: a platform layer for holding process logic and an in-memory database system (IMDB) for processing nucleotide sequence data, wherein the platform layer comprises: a worker framework with a plurality of workers, wherein each worker is running on a node of a cluster and wherein the workers are processing in parallel, wherein all results and intermediate results are stored in the in-memory database (IMDB), and with an alignment coordinator, which is adapted to provide the in-memory database system (IMDB) with a modified alignment functionality.
地址 Potsdam DE