发明名称 Systems and methods for redistributing data in a relational database
摘要 Systems and methods for redistributing data in a relational database are disclosed. In one embodiment, the database includes a plurality of rows of data distributed across a plurality of slices of a table in the database. The database system is configured to distribute the rows of data across the slices according to a first function based on one or more columns of the table. The database system monitors at least one database statistic indicative of variation in a distribution of the rows of data across the slices and detects a redistribution condition based on the at least one monitored database statistic. The database system is further configured to respond to the detected redistribution condition by redistributing the rows of data across the slices according to a second function based on a different number of columns than the first function.
申请公布号 US9477741(B2) 申请公布日期 2016.10.25
申请号 US201314034327 申请日期 2013.09.23
申请人 Clustrix, Inc. 发明人 Frantz Jason;Tsarev Sergei;Gale Jim;Smith Scott
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Knobbe, Martens, Olson & Bear LLP 代理人 Knobbe, Martens, Olson & Bear LLP
主权项 1. A method of redistributing data in a distributed database comprising a plurality of rows of data distributed across a plurality of slices of a table in the database, the method comprising: distributing the rows of data across the slices unevenly according to a first hash function value of at least one column of each of the rows of the table; monitoring at least one database statistic wherein the at least one database statistic comprises at least one of: a hash range occupancy ratio for each slice in said plurality of slices, a list of hot values for each column of each slice in said plurality of slices, a number of distinct values for each column of each slice in said plurality of slices, probabilistic data distribution of each column in each slice in said plurality of slices, or a quantile distribution of values for each column of each slice in said plurality of slices; detecting a redistribution condition based on the at least one monitored database statistic; and responding to the detected redistribution condition by redistributing the rows of data across the slices according to a second hash function wherein the second hash function is based on a different number of column values of the table than the first hash function; wherein the method is performed by one or more computing devices.
地址 San Francisco CA US