发明名称 |
FLASH OPTIMIZED COLUMNAR DATA LAYOUT AND DATA ACCESS ALGORITHMS FOR BIG DATA QUERY ENGINES |
摘要 |
A technique relates to flash-optimized data layout of a dataset for queries. Selection columns are stored in flash memory according to a selection optimized layout, where the selection optimized layout is configured to optimize predicate matching and data skipping. The selection optimized layout, for each selection column, is formed by storing a selection column dictionary filled with unique data values in a given selection column, where the unique data values are stored in sorted order in the selection column dictionary. Row position designations are stored corresponding to each row position that the unique data values are present within the given selection column, without duplicating storage of any of the unique data values that occur more than once in the given selection column. |
申请公布号 |
US2015363167(A1) |
申请公布日期 |
2015.12.17 |
申请号 |
US201414305179 |
申请日期 |
2014.06.16 |
申请人 |
International Business Machines Corporation |
发明人 |
Kaushik Rini |
分类号 |
G06F7/24;G06F17/30;G06F12/02 |
主分类号 |
G06F7/24 |
代理机构 |
|
代理人 |
|
主权项 |
1. A method for flash-optimized data layout of a dataset for queries, the method comprising:
storing, by a processor, selection columns in flash memory according to a selection optimized layout, the selection optimized layout being configured to optimize predicate matching and data skipping; wherein the selection optimized layout, for each selection column, is formed by: storing a selection column dictionary filled with unique data values in a given selection column, the unique data values stored in sorted order in the selection column dictionary; and storing row position designations corresponding to each row position that the unique data values are present within the given selection column, without duplicating storage of any of the unique data values that occur more than once in the given selection column. |
地址 |
Armonk NY US |