发明名称 System, method and computer-readable medium for optimization of multiple-parallel join operations on skewed data
摘要 Techniques that facilitate management of skewed data during a parallel multiple join operation are provided. Portions of tables involved in the join operation can be distributed among a plurality of processing modules, and each of the processing modules can be provided with a list of skewed values of a join column of a larger table involved in the join operation. Each of the processing modules can scan the rows of first and second tables distributed to the processing modules and compare values of the join columns of both tables with the list of skewed values. One or more of the processing modules can then redistribute the skewed values.
申请公布号 US9489427(B2) 申请公布日期 2016.11.08
申请号 US201213466519 申请日期 2012.05.08
申请人 Teradata US, Inc. 发明人 Xu Yu
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人 Mahboubian Ramin;Campbell, Jr. Randy L.
主权项 1. A method of facilitating a multiple join operation in a processing system that includes a plurality of processing modules, wherein the join operation comprises a join on a column of a first table, a column of a second table, and a column of a third table, the method comprising: distributing a respective set of rows of the first, second and third tables to each of the plurality of processing modules; redistributing by each of the plurality of processing modules to at least another one of the plurality of processing modules: (i) each row of the respective set of rows of the first table that has a value of the column of the first table that does not match any one of skewed values of the column of the first table, (ii) each row of the respective set of rows of the second table that has a value of the column of the second table that does not match any of the skewed values of the column of the first table and (iii) one or more rows of the distributed respective set of rows of the third table involved in the join operation; locally maintaining, by a first processing module of the plurality of processing modules, one or more of the distributed respective set of rows that each has a value of the column of the first table that matches one of the skewed values; duplicating, by a second processing module of the plurality of processing modules, one or more of the distributed respective set of rows that each has a value of the column of the second table that matches one of the skewed values; and processing, by the first processing module of the plurality of processing modules, any row of the respective set of rows that has a value of the column of the first table that matches any of the skewed values.
地址 Dayton OH US