发明名称 METHOD FOR REBALANCING DATA PARTITIONS
摘要 Embodiments of the present invention disclose a computer program product for rebalancing partitioned data based, at least in part, on limit key extrapolation in a database and one or more characteristics of the plurality of database partitions. Responsive to a determination that an upper limit key value of the last loaded record is greater than an upper limit key value of an empty partition, the computer redefines the upper limit key value of the empty partition using an extrapolated upper limit key value that is based, at least in part, on a range of limit key values. The computer updates one or more characteristics of the database, wherein the one or more characteristics include one or both of a) an average number of records per partition, and b) an average number of unique limit key values per partition.
申请公布号 US2017039262(A1) 申请公布日期 2017.02.09
申请号 US201615338482 申请日期 2016.10.31
申请人 International Business Machines Corporation 发明人 Ng Ka Chun;Roberts Haakon
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A computer program product for rebalancing partitioned data in a database, the computer program product comprising: one or more computer-readable storage media and program instructions stored on the one or more computer-readable storage media, the program instructions comprising: program instructions to initiate a rebalance of a first set of data records included in a plurality of database partitions by unloading the first set of data records, wherein the rebalance of the first set of data records is improved based, at least in part, on one or more characteristics of the plurality of database partitions that are determined during the rebalance of the first set of data records;program instructions to determine a first grouping, comprising a first sub-set of data records in the first set of data records, with a single limit key value;program instructions to calculate the average number of records per partition by dividing a total number of data records in the first set of data records by a total number of database partitions in the plurality of database partitions;program instructions to determine whether a first partition is filled based, at least in part, on a first loaded grouping increasing the number of data records loaded into the first partition, such that the number of data records loaded into the first partition is at least equal to the average number of records per partition;program instructions to respond to a determination that the first partition is filled by updating the average number of records per partition, wherein the updated average number of records per partition is determined by dividing a number of data records remaining to be loaded by a number of database partitions remaining to be filled;program instructions to determine a second grouping comprising a sub-set of data records in the first set of data records with a single limit key value;program instructions to determine a total number of unique limit key values included in the second grouping by: unloading the first set of data records using one or more of: a) an index scan of a unique index of the plurality of database partitions, b) a table space scan, and c) a sort of the first set of data records by ascending limit key value;counting the total number of unique limit key values using one of: a) a counter using an index key during unloading, b) a hashing algorithm during unloading, and c) a counter during re-reading of the first set of data records subsequent to unloading and sorting by ascending limit key values;calculating the average number of unique limit key values per partition by dividing the total number of unique limit key values in the first set of data records by a total number of database partitions in the plurality of database partitions;program instructions to determine whether a second partition is filled based, at least in part, on a second loaded grouping increasing the number of unique limit key values loaded into the second partition, such that the number of unique limit key values loaded into the second partition is at least equal to the average number of unique limit key values per partition;program instructions to respond to a determination that the second partition is filled by updating the average number of unique limit key values per partition, wherein the updated average number of unique limit key values per partition is determined by dividing a number of unique limit key values remaining to be loaded by a number of database partitions remaining to be filled;program instructions to determine if a first empty partition remains after loading the first set of data records;program instructions to respond to a determination that a first empty partition remains after loading the first set of data records by performing limit key extrapolation for the first empty partition by: defining at least one database partition of the plurality of database partitions by an upper limit key value based on a limit key value of a last loaded data record;determining a range of limit key values for the plurality of database partitions from limit key definitions in a database;determining if the limit key of the last loaded record is greater than a first upper limit key value of the first empty partition, wherein the first upper limit key value of the first empty partition was previously defined by the upper limit key value of a corresponding partition before a rebalance;responding to a determination that the upper limit key value of the last loaded record is greater than the first upper limit key value of the first empty partition by redefining the first upper limit key value of the first empty partition by an extrapolated upper limit key value based, at least in part, on the range of limit key values;calculating a remaining range of limit key values by subtracting a greatest limit key value of a reloaded set of data records from a greatest limit key value in the range of limit key values for the plurality of database partitions;calculating an arithmetic average by dividing the remaining range of limit key values by the number of empty partitions, wherein the arithmetic average is based, at least in part, on a limit key column data type, and wherein the limit key column data type is one of: a) a numeric limit key column data type, wherein the arithmetic average is determined by dividing the remaining range of limit key values by the number of empty partitions, b) a date/timestamp limit key column data type, wherein the arithmetic average is determined by converting the remaining range of limit key values from date/timestamp values into a number of days, and then dividing by the number of empty partitions, or c) a character limit key column data type, wherein, the arithmetic average is determined by converting the remaining range of limit key values from character values to floating point values, dividing by the number of empty partitions, and then converting a resulting floating point value to a character value;calculating the first upper limit key value of the first empty partition by adding the calculated arithmetic average to the upper limit key value of the last loaded record;calculating a second upper limit key value of a second empty partition by adding the calculated arithmetic average to the first upper limit key value of the first empty partition; andprogram instructions to load a second set of data records into the plurality of database partitions based, at least in part, on the one or more characteristics of the plurality of database partitions, wherein the one or more characteristics are used to direct the rebalancing of the first set of data records and the second set of data records.
地址 Armonk NY US