发明名称 CONTENTION AND SELECTION OF CONTROLLING WORK COORDINATOR IN A DISTRIBUTED COMPUTING ENVIRONMENT
摘要 A distributed work processing system for processing computational tasks is scalable and fault-tolerant without requiring centralized control. Worker processes running on worker hosts are organized into a logical group and worker coordinators running on worker coordinator hosts coordinate tasks assigned to worker processes. A task store might hold a collection of tasks to be performed by the logical group. A lock database can be used for locking the logical group for coordination by one worker coordinator process at a time. A membership store contains mappings of worker processes to logical groups, and an assignment store indicates which tasks are assigned to which workers. The worker coordinator process has a scanner process to deal with unassigned tasks and deduplicating duplicate assignments. If a worker coordinator does not see enough worker processes, it can instantiate more. If a worker process does not see a worker coordinator, it can instantiate one.
申请公布号 US2017017527(A1) 申请公布日期 2017.01.19
申请号 US201615280233 申请日期 2016.09.29
申请人 Amazon Technologies, Inc. 发明人 Halim AndyGibb;Patil Swapneel
分类号 G06F9/52;G06F9/48 主分类号 G06F9/52
代理机构 代理人
主权项 1. A computer-implemented method for managing distributed work processing, comprising: under control of one or more computer systems configured with executable instructions, executing a first worker coordinator on a first worker coordinator host of a plurality of worker coordinator hosts, wherein the first worker coordinator host is a computer system that executes program code and wherein executing the first worker coordinator includes requesting a lock record for a logical group, the logical group corresponding to a group of workers, each worker executed as a computer process on a worker host that is a computer system that executes program code;executing a second worker coordinator on a second worker coordinator host of the plurality of worker coordinator hosts, wherein the second worker coordinator host is a computer system that executes program code and wherein executing the second worker coordinator includes requesting the lock record for the logical group, and wherein the second worker coordinator host is independent of the first worker coordinator host;indicating, in the lock record, which of the first worker coordinator or the second worker coordinator is granted the lock record to be a controlling worker coordinator;determining, using a determining worker, that a worker coordinator is not active for the logical group, wherein the determining worker is a worker of the logical group that determined that no worker coordinator is active for the logical group; andinvoking a worker coordinator using the determining worker.
地址 Seattle WA US