发明名称 System and method for providing high availability data
摘要 A computer-implemented data processing system and method writes a first plurality of copies of a data set at a first plurality of hosts and reads a second plurality of copies of the data set at a second plurality of hosts. The first and second pluralities of copies may be overlapping and the first and second pluralities of hosts may be overlapping. A hashing function may be used to select the first and second pluralities of hosts. Version histories for each of the first copies of the data set may also be written at the first plurality of hosts and read at the second plurality of hosts. The version histories for the second copies of the data set may be compared and causal between the second copies of the data set may be evaluated based on the version histories for the second copies of the data set.
申请公布号 US9223841(B2) 申请公布日期 2015.12.29
申请号 US201012767759 申请日期 2010.04.26
申请人 Amazon Technologies, Inc. 发明人 Vosshall Peter S.;deCandia Giuseppe;Hastorun Deniz;Lakshman Avinash;Pilchin Alex;Rosero Ivan D.
分类号 G06F17/30;G06F11/20 主分类号 G06F17/30
代理机构 Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C. 代理人 Kowert Robert C.;Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
主权项 1. A computer-implemented data storage system comprising: host mapping logic configured to map responsibility for storing a plurality of data sets to individual ones of a plurality of hosts which cooperate to implement a data storage system; a hardware processor; data set replication logic executed by the hardware processor configured to execute instructions stored in memory, the data set replication logic configured to: obtain a first version of a data set to be written;select a first subset of the plurality of hosts to write the first version of the data set;write a first plurality of copies of the first version of a data set at the first subset of the plurality of hosts, wherein the first plurality of copies respectively include a version history of the first version of the data set;obtain a second version of the data set to be written, wherein the second version of the data set comprises one or more updates to the data set inconsistent with at least a portion of the first version of the data set;select a second subset of the plurality of hosts to write the second version of the data set, wherein the first and second subsets of the plurality of hosts include at least one different host;write a second plurality of copies of the second version of the data set at the second subset of the plurality of hosts, wherein the second plurality of copies respectively include another version history of the second version of the data set; data set retrieval logic executed by a hardware processor configured to execute instructions stored in memory, the data set retrieval logic configured to be responsive to a request to provide a single copy of the data set by reading a third plurality of copies of the data set at a third subset of the plurality of hosts, wherein the third plurality of copies include at least one copy of the first version of the data set and at least one copy of the second version of the data set, wherein the third subset of the plurality of hosts has at least one host in common with the first subset of the plurality of hosts and at least one host in common with the second subset of the plurality of hosts, and wherein the at least one host in common with the first subset of the plurality of hosts is not a member of the second subset of the plurality of hosts; and an evaluation component configured to provide a single copy of the data set by: reading the third plurality of copies of the data set; andevaluating the version history and the other version history to reconcile the first version of the data set and the second version of the data set read from the third plurality of copies into the single copy of the data set; wherein the evaluation component is configured to be invoked after the third plurality of copies of the data set is read.
地址 Reno NV US