发明名称 Parallel bootstrap aggregating in a data warehouse appliance
摘要 A method of bootstrap sampling a dataset is described. With a process node, a series of random integers is generated. An assignment map is created. The assignment map includes a row identifier for each row of data of the dataset. A plurality of bootstrap sample identifiers defined by the series are assigned to at least one row identifier. An output table created from the assignment map. Rows of the output table include each instance of the bootstrap sample identifiers, the row identifier assigned with the bootstrap sample identifier, and data of the row.
申请公布号 US9613113(B2) 申请公布日期 2017.04.04
申请号 US201414230671 申请日期 2014.03.31
申请人 International Business Machines Corporation 发明人 Dygas Sylwester A.;Iwanowski Michal T.;Plonski Piotr;Rokicki Mariusz
分类号 G06F7/00;G06F17/30 主分类号 G06F7/00
代理机构 代理人 Dobson Scott S.
主权项 1. A process node for creating bootstrap samples from a dataset, wherein the process node is one of a plurality of process nodes in a data warehouse appliance, the process node comprising: a processor; and a memory communicatively coupled to the processor, wherein the memory is encoded with instructions and wherein the instructions when executed by the processor include: receive a random seed from a host server; generate, with the process node, a series of random integers using the random seed, wherein the random seed is configured to cause each of the plurality of process nodes to generate the same series of random integers when received by each of the plurality of process nodes; create, with the process node, an assignment map, the assignment map includes a row identifier for each row of the dataset; assign, with the process node, a plurality of bootstrap sample identifiers defined by the series of random integers to at least one row identifier in the assignment map; and create, with the process node and based on the assignment map, an output table, each row of the output table includes an instance of a bootstrap sample identifier, the row identifier assigned with the instance of the bootstrap sample identifier, and data from the row of the data set associated with the row identifier.
地址 Armonk NY US