发明名称 Functionality of decomposition data skew in asymmetric massively parallel processing databases
摘要 Database queries are optimized through the functionality of decomposition data skew in an asymmetric massively parallel processing database system. A table having data skew is restructured by (1) storing original data values of a distribution key in a special switch column added to the table, (2) replacing the original data values of the distribution key with modified data values such as randomly generated data values, and (3) partitioning the rows across the nodes of the asymmetric massively parallel processing database system based on the distribution key. The original data values that are stored and replaced may only comprise a subset of the original data values that cause data skew in the table. Data skew is reduced, which improves performance, yet the original data values remain available, which reduces the impact on collocated joins.
申请公布号 US9355127(B2) 申请公布日期 2016.05.31
申请号 US201213650863 申请日期 2012.10.12
申请人 International Business Machines Corporation 发明人 Gaza Lukasz;Gruszecki Artur M.;Kazalski Tomasz;Milka Grzegorz S.;Skibski Konrad Krzysztof;Stradomski Tomasz;Yanayt Natalya A.
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Gates & Cooper LLP 代理人 Gates & Cooper LLP
主权项 1. A method of restructuring a table having data skew in a computer system, the computer system storing data from a database in partitions on one or more nodes of the computer system, the method comprising: determining whether original data values of a distribution key column of the table include frequent data values that cause data skew in the table; after the original data values of the distribution key column have been determined to include the frequent data values, copying only the original data values of the distribution key column that comprise the frequent data values to a switch column added to the table; after the original data values of the distribution key column that comprise the frequent data values have been copied to the switch column, replacing only the original data values in the distribution key column that comprise the frequent data values with modified data values that reduce the data skew in the table during partitioning, wherein the original data values that are copied and replaced comprise a subset of the original data values and the subset of the original data values comprises one or more of the frequent data values that cause the data skew in the table; after the original data values in the distribution key column that comprise the frequent data values have been replaced, partitioning the rows of the table across the nodes of the computer system using the distribution key column with the modified data values; and performing database operations other than the partitioning using the original data values, but not the modified data values.
地址 Armonk NY US