摘要 |
A method, non-transitory computer readable medium, and apparatus for estimating a completion time for a MapReduce job are disclosed. For example, the method builds a general MapReduce performance model, computes one or more performance characteristics of each one of one or more benchmark workloads, computes one or more performance characteristics of the MapReduce job in the known processing system, selects a subset of the one or more benchmark workloads that have similar performance characteristics as the one or more performance characteristics of the MapReduce job, targets a cluster of processing nodes in a distributed processing system, computes one or more performance characteristics of the subset of the one or more benchmark workloads in the cluster of processing nodes and estimates the completion time for the MapReduce job. |
主权项 |
1. A method for estimating a completion time for a MapReduce job, comprising:
building, by a processor, a general MapReduce performance model; computing, by the processor, one or more performance characteristics of each one of one or more benchmark workloads in accordance with the general MapReduce performance model in a known processing system; computing, by the processor, one or more performance characteristics of the MapReduce job in accordance with the general MapReduce performance model in the known processing system; selecting, by the processor, a subset of the one or more benchmark workloads that have similar performance characteristics as the one or more performance characteristics of the MapReduce job; targeting, by the processor, a cluster of processing nodes in a distributed processing system having one or more unknown hardware configurations; computing, by the processor, one or more performance characteristics of the subset of the one or more benchmark workloads in the cluster of processing nodes; and estimating, by the processor, the completion time for the MapReduce job based upon a comparative analysis of the one or more performance characteristics of the subset of the one or more benchmark workloads in the cluster of processing nodes and the one or more performance characteristics of the subset of the one or more benchmark workloads in the known processing system. |