发明名称 OPTIMIZING EXECUTION AND RESOURCE USAGE IN LARGE SCALE COMPUTING
摘要 A method for tuning workflow settings in a distributed computing workflow comprising sequential interdependent jobs includes pairing a terminal stage of a first job and a leading stage of a second, sequential job to form an optimization pair, in which data segments output by the terminal stage of the first job comprises data input for the leading stage of the second job. The performance of the optimization pair is tuned by determining, with a computational processor, an estimated minimum execution time for the optimization pair and increasing the minimum execution time to generate an increased execution time. The method further includes calculating a minimum number of data segments that still permit execution of the optimization pair within the increased execution time.
申请公布号 US2015355951(A1) 申请公布日期 2015.12.10
申请号 US201514825987 申请日期 2015.08.13
申请人 HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P. 发明人 Cherkasova Ludmila;Zhang Zhuoyao
分类号 G06F9/50 主分类号 G06F9/50
代理机构 代理人
主权项 1. A method for tuning workflow settings of a MapReduce workflow comprising a series of sequential MapReduce jobs executed on a distributed computing system, each MapReduce job comprising a map task and a reduce task, the method comprising; pairing a reduce stage of a first MapReduce job and a map stage of a second, sequential MapReduce job to form an optimization pair, in which data output of the reduce task of the first MapReduce job is data input for the map task of the second MapReduce job; determining, with a computational processor, an estimated minimum execution time and corresponding number of reduce tasks to execute the optimization pair within the estimated minimum execution time; tuning performance of the optimization pair by selecting an increased execution time with a minimum number of reduce tasks; and executing, on the distributed computing system, the optimization pair to produce the minimum number of reduce tasks.
地址 Houston TX US