发明名称 Optimization of map-reduce shuffle performance through shuffler I/O pipeline actions and planning
摘要 A shuffler receives information associated with partition segments of map task outputs and a pipeline policy for a job running on a computing device. The shuffler transmits to an operating system of the computing device a request to lock partition segments of the map task outputs and transmits an advisement to keep or load partition segments of map task outputs in the memory of the computing device. The shuffler creates a pipeline based on the pipeline policy, wherein the pipeline includes partition segments locked in the memory and partition segments advised to keep or load in the memory, of the computing device for the job, and the shuffler selects the partition segments locked in the memory, followed by partition segments advised to keep or load in the memory, as a preferential order of partition segments to shuffle.
申请公布号 US9389994(B2) 申请公布日期 2016.07.12
申请号 US201314090282 申请日期 2013.11.26
申请人 International Business Machines Corporation 发明人 Hu Zhenhua;Ma Hao Hai;Tang Wentao;Xu Qiang
分类号 G06F9/46;G06F12/00;G06F9/54 主分类号 G06F9/46
代理机构 代理人 Simek Daniel R.;Carpenter Maeve M.
主权项 1. A computer program product for optimizing a MapReduce shuffle, the computer program product comprising: a computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code executable by a processor, to: receive information associated with partition segments of map task outputs and a pipeline policy for a job running on a computing device; transmit to an operating system of the computing device a request to lock partition segments of the map task outputs in a memory of the computing device; transmit to the operating system of the computing device an advisement to keep or load partition segments of map task outputs in the memory of the computing device, based on a capacity of the memory of the computing device, wherein partition segments of the map task outputs requested to be locked in the memory of the computing device are different from the partition segments of map task outputs advised to keep or load in the memory of the computing device; create a pipeline based on the pipeline policy, wherein the pipeline includes partition segments of map task outputs locked in the memory of the computing device and partition segments of map task outputs advised to keep or load in the memory of the computing device for the job; and select the partition segments locked in the memory, followed by the partition segments advised to keep or load in the memory, as a preferential order of partition segments to shuffle.
地址 Armonk NY US