摘要 |
A method, apparatus, and article of manufacture for random sampling of rows stored in a table, wherein the table has a plurality of partitions. A row count is determined for each of the partitions of the table and a total number of rows in the table is determined from the row count for each of the partitions of the table. A proportional allocation of a sample size is computed for each of the partitions based on the row count and the total number of rows. A sample set of rows of the sample size is retrieved from the table, wherein each of the partitions of the table contributes its proportional allocation of rows to the sample set of rows. Preferably, the computer system is a parallel processing database system, wherein each of its processing units manages a partition of the table, and some of the above steps can be performed in parallel by the processing units.
|