发明名称 SKEW-AWARE STORAGE AND QUERY EXECUTION ON DISTRIBUTED DATABASE SYSTEMS
摘要 Distributing rows of data in a distributed table distributed across a plurality of nodes. A method includes identifying skewed rows of a first table to be distributed in a distributed database system. The skewed rows include a common data value in a column such that the skewed rows are skewed, according to a predetermined skew factor, with respect to other rows in the first table not having the common data value. Non-skewed rows of the first table that are not skewed according to the skew factor are identified. The skewed rows of the first table are distributed across nodes in a non-deterministic fashion. The non-skewed rows of the first table are distributed across nodes in a deterministic fashion. The rows of the first table distributed across the nodes, whether distributed in a deterministic fashion or non-deterministic fashion, are stored in a single table at each of the nodes.
申请公布号 US2014379692(A1) 申请公布日期 2014.12.25
申请号 US201313922098 申请日期 2013.06.19
申请人 Microsoft Corporation 发明人 Teletia Nikhil;Halverson Alan Dale;Shankar Srinath;Naughton Jeffrey
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. In a distributed computing environment a method of distributing rows of data in a distributed table distributed across a plurality of nodes, the method comprising: identifying skewed rows of a first table, the first table to be distributed in a distributed database system, the skewed rows comprising a common data value in a column such that the skewed rows are skewed, according to a predetermined skew factor, with respect to other rows in the first table not having the common data value; identifying non-skewed rows of the first table that are not skewed according to the skew factor; distributing the skewed rows of the first table across nodes in a non-deterministic fashion; distributing the non-skewed rows of the first table across nodes in a deterministic fashion; and wherein the rows of the first table distributed across the nodes, whether distributed in a deterministic fashion or non-deterministic fashion, are stored in a single table at each of the nodes.
地址 Redmond WA US