发明名称 MUTATIONS IN A COLUMN STORE
摘要 Columnar storage provides many performance and space saving benefits for analytic workloads, but previous mechanisms for handling single row update transactions in column stores suffer from poor performance. A columnar data layout facilitates both low-latency random access capabilities together with high-throughput analytical access capabilities, simplifying Hadoop architectures for use cases involving real-time data. In disclosed embodiments, mutations within a single row are executed atomically across columns and do not necessarily include the entirety of a row. This allows for faster updates without the overhead of reading or rewriting larger columns.
申请公布号 US2016328429(A1) 申请公布日期 2016.11.10
申请号 US201615149128 申请日期 2016.05.07
申请人 Cloudera, Inc. 发明人 Lipcon Todd
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A system facilitating low-latency random access capabilities together with high-throughput analytical access capabilities in connection with a request for processing the stored data, the system comprising: a database table distributing data partitioned into a plurality of horizontal tablets, each horizontal tablet in the plurality of horizontal tablets storing the data in a plurality of rows; the database table including a plurality of columns arranged according to a pre-defined schema; a column in the plurality of columns including a primary key column that stores a key uniquely identifying each row in the plurality of rows by mapping each row to exclusively a single tablet in the plurality of tablets, wherein each tablet in the plurality of tablets comprises: a plurality of DiskRowSets for storing the data, each DiskRowSet in the plurality of DiskRowSets including: a base data module existing in disk and storing a subset of rows in the plurality of rows according to a column-organized representation based upon writing each column in the plurality of columns as a single contiguous block,a Bloom filter of the set of keys included in the primary key column for detecting membership of the set of keys in the each DiskRowSet,a delta store module existing in memory and maintaining a mapping for mutating the subset of rows included in the each DiskRowSet, anda single MemRowSet existing in memory and implemented as a concurrent Binary tree (B-tree), the single MemRowSet receiving new data to be inserted into the database table, buffering the new data as a recently-inserted row, and flushing the recently-inserted row to a DiskRowSet in the plurality of DiskRowSets.
地址 Palo Alto CA US