摘要 |
Reliably making configuration changes to distributed systems, including receiving commands for multiple configuration changes, subdividing configuration changes into separate tasks, and performing those tasks at each node. A configuration element receives sets of configuration change commands, acknowledging them so the user need not wait before issuing additional commands. Tasks are determined, each including consistent changes to system configuration, and each including single-device tasklets. Each particular tasklet might be assigned to a particular single device, or to any single device in the system. Next tasks are performed when tasklets are complete. If tasklets are not timely performed due to nodes which are relatively unresponsive, those nodes are marked “failed.” When a failed node returns to responsiveness, it marks itself “recovering.” When a recovering node catches up, it marks itself “operational.” Updates by failed or recovering nodes are skipped while synchronizing with operational nodes. |
主权项 |
1. A method, including steps of
receiving commands indicating one or more configuration changes associated with a distributed system including a plurality of nodes, each said configuration change including modification of at least one of: configuration information associated with said distributed system, configuration information associated with at least one said node; determining one or more tasks to make configuration changes, each said task including a consistent change to said configuration information; determining for each said task one or more tasklets to perform said task, each said tasklet performable at a single said node, and one or more target nodes at which to perform a copy of each said tasklet. |