主权项 |
1. A computer-implemented method for monitoring resource consumption by individual jobs executing in a distributed computing environment that processes large data sets across clusters of server devices, comprising:
receiving, by a worker entry sensor instrumented in a worker application, a job request from a job manager located across a network remotely from the given server device, where the job request is one of a plurality of job requests processing a large data set in parallel and the worker application resides on a given server device in the cluster of server devices; extracting, by the worker entry sensor, identifying information for the job request from the job request received by the entry sensor; determining, by a measurement agent residing on the given server device, metrics indicative of resource utilization by the worker application while the worker application is processing the job request; determining, by the measurement agent, identifying information for the measurement agent; generating, by the measurement agent, a measurement event, where the measurement event includes the identifying information for the job request, the identifying information for the measurement agent, and the performance metrics; and sending, by the measurement agent, the measurement event to a monitoring node residing across the network remotely from the given server device. |