发明名称 |
SKEW-AWARE STORAGE AND QUERY EXECUTION ON DISTRIBUTED DATABASE SYSTEMS |
摘要 |
Distributing rows of data in a distributed table distributed across a plurality of nodes. A method includes identifying skewed rows of a first table to be distributed in a distributed database system. The skewed rows include a common data value in a column such that the skewed rows are skewed, according to a predetermined skew factor, with respect to other rows in the first table not having the common data value. Non-skewed rows of the first table that are not skewed according to the skew factor are identified. The skewed rows of the first table are distributed across nodes in a non-deterministic fashion. The non-skewed rows of the first table are distributed across nodes in a deterministic fashion. The rows of the first table distributed across the nodes, whether distributed in a deterministic fashion or non-deterministic fashion, are stored in a single table at each of the nodes. |
申请公布号 |
US2014379692(A1) |
申请公布日期 |
2014.12.25 |
申请号 |
US201313922098 |
申请日期 |
2013.06.19 |
申请人 |
Microsoft Corporation |
发明人 |
Teletia Nikhil;Halverson Alan Dale;Shankar Srinath;Naughton Jeffrey |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
|
代理人 |
|
主权项 |
1. In a distributed computing environment a method of distributing rows of data in a distributed table distributed across a plurality of nodes, the method comprising:
identifying skewed rows of a first table, the first table to be distributed in a distributed database system, the skewed rows comprising a common data value in a column such that the skewed rows are skewed, according to a predetermined skew factor, with respect to other rows in the first table not having the common data value; identifying non-skewed rows of the first table that are not skewed according to the skew factor; distributing the skewed rows of the first table across nodes in a non-deterministic fashion; distributing the non-skewed rows of the first table across nodes in a deterministic fashion; and wherein the rows of the first table distributed across the nodes, whether distributed in a deterministic fashion or non-deterministic fashion, are stored in a single table at each of the nodes. |
地址 |
Redmond WA US |