主权项 |
1. A method for distributing processing of a dataset among two or more workers, said dataset comprising a number of data records, each data record having a unique key, the keys being represented as integer numbers, the data records being arranged in the order of increasing or decreasing key values, the method comprising the steps of:
splitting the dataset into one or more chunks, each chunk comprising a plurality of data records, and assigning each chunk of the dataset to a worker, and allowing each of the worker(s) to process the data records of the chunk assigned to it, a further worker requesting access to the dataset, identifying the largest chunk among the chunk(s) assigned to the worker(s) already processing data records of the dataset, and requesting the worker having the identified chunk assigned to it to split the chunk, said worker selecting a split point, said worker splitting the identified chunk into two new chunks, at the selected split point, and assigning one of the new chunks to itself, and assigning the other of the new chunks to the further worker, and allowing the workers to process data records of the chunks assigned to them. |