发明名称 Dynamic replica failure detection and healing
摘要 Detecting replica faults within a replica group and dynamically scheduling replica healing operations are described. Status metadata for one or more replica groups may be accessed. Based, at least in part, the status data a number of available replicas for at least one replica group may be determined to incompliant with a healthy state definition for the replica group. One or more healing operations to restore the number of available replicas for the at least one replica group to the respective healthy state definition may be dynamically scheduled. In some embodiments, one or more resource constraints for performing healing operations and one or more resource requirements for each of the one or more healing operations may be used to order the one or more healing operations.
申请公布号 US9304815(B1) 申请公布日期 2016.04.05
申请号 US201313917317 申请日期 2013.06.13
申请人 Amazon Technologies, Inc. 发明人 Vasanth Jai;Hunter, Jr. Barry Bailey;Muniswamy-Reddy Kiran-Kumar;Lutz David Alan;Wang Jian;MacCanti Maximiliano
分类号 G06F9/48;G06F11/07;G06F3/06;G06F17/30 主分类号 G06F9/48
代理机构 Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C. 代理人 Kowert Robert C.;Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
主权项 1. A system, comprising: a plurality of computing nodes, each comprising at least one processor and memory, wherein the plurality of computing nodes are configured to implement a data storage service, wherein the data storage service comprises: one or more replica groups stored among the plurality of computing nodes, wherein each of the one or more replica groups maintains one or more replicas of data on behalf of one or more storage service clients, wherein each replica group of the one or more replica groups includes a respective healthy state definition for the replica group;a replica group status sweeper, configured to identify replica groups with a number of available replicas not compliant with the respective healthy state definition for the respective replica group, wherein said identification is based, at least in part, on status metadata for the respective replica group; anda dynamic heal scheduler, configured to schedule one or more replica healing operations to restore the number of available replicas for the identified replica groups to the respective healthy state definition for the identified replica groups based, at least in part, on one or more resource constraints for performing healing operations,wherein to schedule the one or more replica healing operations, the dynamic heal scheduler is further configured to determine an order in which the one or more replica healing operations are to be performed without exceeding the one or more resource constraints based, at least in part, on one or more resource requirements for each of the one or more replica healing operations.
地址 Reno NV US