发明名称 Block restore ordering in a streaming restore system
摘要 A distributed data warehouse system may maintain data blocks on behalf of clients, and may store primary and secondary copies of each data block on different disks or nodes in a cluster. The warehouse system may back up data blocks in a remote key-value backup storage system. A restore operation may retrieve data blocks from backup storage using their unique identifiers as keys (while incoming queries are serviced) in response to a failure or a query targeting data that was lost or corrupted. The order in which data blocks are restored may be dependent on the relative likelihood that they will be accessed in the near future (e.g., based on how recently or frequently they were accessed, written, or backed up; the values of one or more access counters associated with each data block; or how recently a database table containing data in each data block was loaded).
申请公布号 US9449040(B2) 申请公布日期 2016.09.20
申请号 US201313792914 申请日期 2013.03.11
申请人 Amazon Technologies, Inc. 发明人 Gupta Anurag Windlass
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C. 代理人 Kowert Robert C.;Meyertons, Hood, Kivlin, Kowert & Goetzel, P.C.
主权项 1. A method, comprising: performing, by one or more computers: storing columnar data of a database table in a plurality of physical data blocks in a distributed data storage system on behalf of one or more clients, wherein the distributed data storage system comprises a cluster of one or more nodes, each of which comprises one or more disks on which physical data blocks are stored, and wherein each of the plurality of physical data blocks is associated with a respective unique identifier;storing a copy of each of the plurality of physical data blocks in a remote key-value durable backup storage system, wherein for each of the plurality of physical data blocks, the respective unique identifier serves as a key to access the data block in the remote key-value durable backup storage system;detecting a failure in the distributed data storage system affecting two or more physical data blocks of the plurality of physical data blocks in which the columnar data was stored;in response to said detecting, automatically restoring the two or more physical data blocks, wherein said restoring comprises: analyzing a plurality of previously executed queries to determine a pattern of access to the columnar data stored in the distributed data storage system;determining a priority order in which to restore the two or more physical data blocks based, at least in part, on the pattern of access, the priority order indicating a relative likelihood of subsequent access to each of the two or more physical data blocks;retrieving a copy of one of the two or more physical data blocks having the highest priority from the remote key-value durable backup storage system, wherein said retrieving comprises using the respective unique identifier associated with the one of the two or more physical data blocks as a key to access the copy of the one of the two or more physical data blocks in the key-value durable backup storage system;writing a primary copy of the retrieved copy of the physical data block to a given disk on a given node in the distributed data storage system; andinitiating replication of the retrieved copy of the physical data block on one or more disks in the distributed data storage system other than the given disk.
地址 Reno NV US