发明名称 METHOD AND A SYSTEM FOR DISTRIBUTED PROCESSING OF A DATASET
摘要 When a new worker requests access to a dataset, the largest chunk of the dataset is identified and split into two new chunks by the worker having the chunk assigned to it. The chunk is split in such a manner that both workers have enough un-processed data records, and collisions among the workers processing the data records are avoided. Finding the split point may be an iterative process.
申请公布号 US2014279883(A1) 申请公布日期 2014.09.18
申请号 US201414213248 申请日期 2014.03.14
申请人 Sitecore A/S 发明人 KOSTENKO Dmytro
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for distributing processing of a dataset among two or more workers, said dataset comprising a number of data records, each data record having a unique key, the keys being represented as integer numbers, the data records being arranged in the order of increasing or decreasing key values, the method comprising the steps of: splitting the dataset into one or more chunks, each chunk comprising a plurality of data records, and assigning each chunk of the dataset to a worker, and allowing each of the worker(s) to process the data records of the chunk assigned to it, a further worker requesting access to the dataset, identifying the largest chunk among the chunk(s) assigned to the worker(s) already processing data records of the dataset, and requesting the worker having the identified chunk assigned to it to split the chunk, said worker selecting a split point, said worker splitting the identified chunk into two new chunks, at the selected split point, and assigning one of the new chunks to itself, and assigning the other of the new chunks to the further worker, and allowing the workers to process data records of the chunks assigned to them.
地址 Kobenhavn V DK