发明名称 OPTIMIZING AN ORDER OF EXECUTION OF MULTIPLE JOIN OPERATIONS
摘要 A computer-implemented method, system, and/or computer program product optimizes an order of execution of column join operations. A first partitioning of the first data column splits the first data column into first subsets of rows. A second partitioning of the second data column splits the second data column into a second subsets of rows. A first value frequency information indicates a frequency of attribute values within a subset of rows of the first data column processed. A second value frequency information indicates a frequency of attribute values within a subset of rows of the second data column. Cardinalities of sub-tables derived by a respective joining of the subsets of rows of the first and second data columns are estimated, based on the first and second value frequency information. An order of execution of multiple join operations is then optimized based on the estimated cardinalities of the sub-tables.
申请公布号 US2014156635(A1) 申请公布日期 2014.06.05
申请号 US201314076598 申请日期 2013.11.11
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 GROCHOWSKI MAREK;GRUSZECKI ARTUR M.;KAZALSKI TOMASZ;MILKA GRZEGORZ S.;SKIBSKI KONRAD K.;STRADOMSKI TOMASZ
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the method comprising: providing, by one or more processors, at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a processing unit from the multiple processing units; providing, by one or more processors, at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a processing unit from the multiple processing units; providing, by one or more processors, at least a first value frequency information for each processing unit from the multiple processing units, the first value frequency information indicating a frequency of attribute values within a subset of rows of the first data column processed by a respective processing unit from the multiple processing units; providing, by one or more processors, at least a second value frequency information for each processing unit from the multiple processing units, the second value frequency information indicating a frequency of attribute values within a subset of rows of the second data column processed by the respective processing unit from the multiple processing units; estimating, by one or more processors, cardinalities of sub-tables derived by a respective joining of the subset of rows of the first data column and the subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein estimated cardinalities of the sub-tables are based on the first and second value frequency information of the respective processing unit from the multiple processing units; and optimizing, by one or more processors, an order of execution of multiple join operations based on the estimated cardinalities of the sub-tables.
地址 Armonk NY US