发明名称 PARALLEL BOOTSTRAP AGGREGATING IN A DATA WAREHOUSE APPLIANCE
摘要 A method of bootstrap sampling a dataset is described. With a process node, a series of random integers is generated. An assignment map is created. The assignment map includes a row identifier for each row of data of the dataset. A plurality of bootstrap sample identifiers defined by the series are assigned to at least one row identifier. An output table created from the assignment map. Rows of the output table include each instance of the bootstrap sample identifiers, the row identifier assigned with the bootstrap sample identifier, and data of the row.
申请公布号 US2017124173(A1) 申请公布日期 2017.05.04
申请号 US201715403208 申请日期 2017.01.11
申请人 International Business Machines Corporation 发明人 Dygas Sylwester A.;Iwanowski Michal T.;Plonski Piotr;Rokicki Mariusz
分类号 G06F17/30;H04L29/08 主分类号 G06F17/30
代理机构 代理人
主权项 1. A process node for creating bootstrap samples from a dataset, wherein the process node is one of a plurality of process nodes in a data warehouse appliance, the process node comprising: a processor; and a memory communicatively coupled to the processor, wherein the memory is encoded with instructions and wherein the instructions when executed by the processor include: generate, with the process node, a series of random integers; create, with the process node, an assignment map, the assignment map includes a row identifier for each row of the dataset; assign, with the process node, a plurality of bootstrap sample identifiers defined by the series of random integers to at least one row identifier in the assignment map; and create, with the process node and based on the assignment map, an output table, each row of the output table includes an instance of a bootstrap sample identifier, the row identifier assigned with the instance of the bootstrap sample identifier, and data from the row of the data set associated with the row identifier.
地址 Armonk NY US