发明名称 Method And System For Resource Monitoring Of Large-Scale, Orchestrated, Multi Process Job Execution Environments
摘要 A system and method for monitoring the process resource consumption of massive parallel job executions is disclosed. The described system uses byte code instrumentation to place sensors in methods that receive job execution requests. Those sensors detect start and end of job executions by the process they are deployed to and extract identification data from detected job execution requests that allow to identify the job request. This job identification data is used to tag resource utilization measures, which allows to assign measured resource consumptions to specific job executions. The job identification data is also used to tag transaction tracing data describing transaction executions performed during a specific job execution with job identification data that identifies the job execution that triggered the transaction. The generated job specific measures and transaction traces may be used to identify resource intensive job executions and to identify the root cause of the resource consumption.
申请公布号 US2015339210(A1) 申请公布日期 2015.11.26
申请号 US201514718547 申请日期 2015.05.21
申请人 Dynatrace LLC 发明人 Kopp Michael;Gsenger Guenther
分类号 G06F11/34;G06F11/30;H04L12/24 主分类号 G06F11/34
代理机构 代理人
主权项 1. A computer-implemented method for monitoring resource consumption by individual jobs executing in a distributed computing environment that processes large data sets across clusters of server devices, comprising: receiving, by a worker entry sensor instrumented in a worker application, a job request from a job manager located across a network remotely from the given server device, where the job request is one of a plurality of job requests processing a large data set in parallel and the worker application resides on a given server device in the cluster of server devices; extracting, by the worker entry sensor, identifying information for the job request from the job request received by the entry sensor; determining, by a measurement agent residing on the given server device, metrics indicative of resource utilization by the worker application while the worker application is processing the job request; determining, by the measurement agent, identifying information for the measurement agent; generating, by the measurement agent, a measurement event, where the measurement event includes the identifying information for the job request, the identifying information for the measurement agent, and the performance metrics; and sending, by the measurement agent, the measurement event to a monitoring node residing across the network remotely from the given server device.
地址 Detroit MI US