发明名称 AVOIDANCE OF INTERMEDIATE DATA SKEW IN A MASSIVE PARALLEL PROCESSING ENVIRONMENT
摘要 A computer-implemented method for minimizing join operation processing time within a database system based on estimated joined table spread of the database system has been provided. The computer-implemented method includes, estimating value distribution of data in a joined table, wherein the joined table is a result of join operation between two instances of tables of a database system. The computer-implemented method further includes determining boundaries for partitioning at least one range of attributes of the estimated value distribution, wherein the boundaries for partitioning at least one range of attributes of the estimated value distribution corresponds to a same number of rows of the joined table. The computer-implemented method further includes determining at least one assignment of the determined partition of the at least one range of attributes to processing units of the database system.
申请公布号 US2015186465(A1) 申请公布日期 2015.07.02
申请号 US201314144893 申请日期 2013.12.31
申请人 International Business Machines Corporation 发明人 Gaza Lukasz;GRUSZECKI ARTUR M.;KAZALSKI TOMASZ;MILKA GRZEGORZ S.;SKIBSKI KONRAD K.;STRADOMSKI TOMASZ
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer-implemented method for minimizing join operation processing time within a database system based on estimated joined table spread of the database system, the computer implemented method comprising the steps of: estimating, by one or more processors, value distribution of data in a joined table, wherein the joined table is a result of join operation between two instances of tables of a database system; determining, by the one or more processors, boundaries for partitioning at least one range of attributes of the estimated value distribution, wherein the boundaries for partitioning at least one range of attributes of the estimated value distribution corresponds to a same number of rows of the joined table; and determining, by the one or more processors, at least one assignment of the determined partition of the at least one range of attributes to processing units of the database system.
地址 Armonk NY US