发明名称 DYNAMIC PARTITIONING TECHNIQUES FOR DATA STREAMS
摘要 A partitioning policy, comprising an indication of an initial mapping of data records of a stream to a plurality of partitions, is selected to distribute data records of a data stream among a plurality of nodes of a stream management service. Data ingestion nodes and storage nodes are configured according to the initial mapping. In response to a determination that a triggering criterion for dynamically repartitioning the data stream has been met, a modified mapping is generated, and a different set of ingestion and storage nodes are configured. For at least some time during which arriving data records are stored in accordance with the modified mapping, data records stored at the first set of storage nodes in accordance with the initial mapping are retained.
申请公布号 US2015134796(A1) 申请公布日期 2015.05.14
申请号 US201314077171 申请日期 2013.11.11
申请人 AMAZON TECHNOLOGIES, INC. 发明人 THEIMER MARVIN MICHAEL;GHARE GAURAV D.;DUNAGAN JOHN DAVID;BURGESS GREG;XIONG YING
分类号 H04L12/24 主分类号 H04L12/24
代理机构 代理人
主权项 1. A system, comprising: one or more computing devices configured to: determine a partitioning policy to be applied to distribute data records of a data stream among a plurality of nodes of a multi-tenant stream management service, wherein the partitioning policy comprises an initial mapping of data records to a plurality of partitions based at least in part on one or more attribute values associated with the data records;identify, using the initial mapping, a first partition of which a particular data record of the data stream is to be designated a member, based at least in part on a particular attribute value;generate, corresponding to the particular data record, a sequence number indicative of a position of the particular data record within a record acquisition sequence at an ingestion node of the stream management service, wherein the ingestion node is selected based at least in part on the initial mapping;store a plurality of data records of the first partition at a data storage location of the stream management service in an order based at least in part on respective sequence numbers of the plurality of data records, wherein the data storage location is selected based at least in part on the initial mapping; andin response to a determination that a triggering criterion for repartitioning the data stream has been met,generate a modified mapping of data records to partitions,initiate usage of the modified mapping without scheduling a pause in data record acquisitions of the data stream; andselect, for another data record with the particular attribute value, wherein the other data record is received subsequent to an initiation of usage of the modified mapping, at least one of: (a) a different ingestion node of the stream management service or (b) a different data storage location of the stream management service.
地址 Reno NV US