发明名称 Efficient query processing in columnar databases using bloom filters
摘要 A bloom filter is generated for efficient query processing for unsorted data in a column of a columnar database. Bloom filters represented as bitmaps are generated for data blocks storing data for a column of a columnar database table. An indication of a query directed toward the column is received and the bloom filter for each data block is examined to determine which ones of the data blocks do not need to be read in order to service the query for the select data. Data is then read from the data blocks storing data for the column excepting the ones which do not need to be read.
申请公布号 US8972337(B1) 申请公布日期 2015.03.03
申请号 US201313773476 申请日期 2013.02.21
申请人 Amazon Technologies, Inc. 发明人 Gupta Anurag Windlass
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C. 代理人 Kowert Robert C.;Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
主权项 1. A distributed data warehouse system, comprising: a plurality of nodes, wherein at least some nodes of the plurality of nodes each comprise: storage for a columnar database table, wherein said storage comprises a plurality of data blocks;a bloom filter generator, configured to: generate a bloom filter for each of one or more data blocks storing data for a column of the columnar database table, wherein each bloom filter is represented as a bitmap, wherein different patterns of set bits in the bitmap indicate data values not stored in the data block;a read module;a query engine, configured to: receive an indication of a query directed to the column of the columnar database table for select data;evaluate the indication of the query to determine one or more predicate data values that identify the select data;in response to receiving and evaluating the indication of the query: analyze the bitmap representing the bloom filter for the one or more predicate data values for each of the one or more data blocks to determine particular ones of the one or more data blocks which do not need to be read in order to service the query for the select data; anddirect the read module to read the one or more data blocks storing data for the column excepting the particular ones of the one or more data blocks which do not need to be read.
地址 Reno NV US