摘要 |
A data processing method and apparatus. The method comprises: if the number of completed map tasks within a pre-set time period is greater than or equal to a pre-set starting threshold, acquiring the overheads of unassigned partitions in outcomes output from the completed map tasks within the pre-set time period and before the pre-set time period (S201); according to the overhead of each of the unassigned partitions and starting states of reduce tasks, assigning corresponding reduce tasks to N partitions with the highest overheads (S202), where N is an integer greater than or equal to 1; and pulling data of the N partitions to the reduce tasks corresponding to the partitions for processing (S203), which can improve the resource utilization rate, and assigning reduce tasks according to the overheads of the partitions makes the assignment more rational and improves the overall performance of a MapReduce system. |