发明名称 Methods and apparatus for resource management in cluster computing
摘要 Embodiments of an event-driven resource management technique may enable the management of cluster resources at a sub-computer level (e.g., at the thread level) and the decomposition of jobs at an atomic (task) level. A job queue may request a resource for a job from a resource manager, which may locate a resource in a resource list and grant the resource to the job queue. After the resource is granted, the job queue sends the job to the resource, on which the job may be partitioned into tasks and from which additional resources may be requested from the resource manager. The resource manager may locate additional resources in the list and grant the resources to the resource. The resource sends the tasks to the granted resources for execution. As resources complete their tasks, the resource manager is informed so that the status of the resources in the list can be updated.
申请公布号 US9262218(B2) 申请公布日期 2016.02.16
申请号 US201414165396 申请日期 2014.01.27
申请人 Adobe Systems Incorporated 发明人 Bostic Sandford P.;Reiser Stephen Paul;Bigney Andrey J.
分类号 G06F9/46;G06F9/50;G06F11/34;G06F11/30 主分类号 G06F9/46
代理机构 Kilpatrick Townsend & Stockton LLP 代理人 Kilpatrick Townsend & Stockton LLP
主权项 1. A method for tracking jobs performed by computing nodes of a cluster computing system, the method comprising: monitoring, by a management computer, a plurality of computing nodes and an availability of resources provided by the plurality of computing nodes in the cluster computing system; identifying, by the management computer, a first computer independent of the management computer, wherein the first computer is a first computing node of the plurality of computing nodes that is available for performing a first job submitted to a job queue; identifying, by the management computer, a second computer independent of the management computer, wherein the second computer is a second computing node of the plurality of computing nodes that is available for performing a second job submitted to the job queue; generating a first job state object specific to the first job for tracking a first job status of the first job and a second job state object specific to the second job for tracking a second job status of the second job, wherein each job state object is a stand-alone database file that includes job metadata and associated wrapper methods; transmitting, by a job scheduling system, the first job state object to the first computer and the second job state object to the second computer; updating, after completion of a task of the first job, the first job state object independently of any updates to the second job state object after completion of a task of the second job, wherein the first computer updates the first job state object and the second computer updates the second job state object; transmitting, by at least one of the first computer and the second computer, job metadata extracted from at least one of the first job state object and the second job state object in response to a job status query from the job scheduling system, wherein the job metadata is indicative of an updated job status for at least one of the first job and the second job; partitioning each of the first job and the second job into a plurality of tasks; partitioning each of the first job state object and the second job state object into a plurality of task state objects, wherein each of the plurality of task state objects is configured for tracking a respective one of the plurality of tasks; distributing the plurality of tasks and the plurality of task state objects to at least some computing nodes from the plurality of computing nodes; and updating, by each of the at least some computing nodes, a respective task state object subsequent to performing a respective task from the plurality of tasks.
地址 San Jose CA US