发明名称 Dynamic shuffle reconfiguration
摘要 A method includes receiving a request to perform a shuffle operation on a data stream, the request including a set of initial key ranges: generating a shuffler configuration that assigns a shuffler from a set of shufflers to each of the initial key ranges; initiating the set of shufflers to perform the shuffle operation on the data stream; analyzing metadata statistics to determine whether a shuffler configuration update event occurs, the metadata statistics produced by the set of shufflers during the shuffle operation and indicating load statistics for each shuffler in the set of shufflers; and upon occurrence of the shuffler configuration update event and during the shuffle operation, altering the shuffler configuration based at least in part on the metadata statistics to produce an assignment of shufflers to key ranges that is different from the assignment of shufflers to the initial key ranges.
申请公布号 US9483509(B2) 申请公布日期 2016.11.01
申请号 US201314044529 申请日期 2013.10.02
申请人 Google Inc. 发明人 Balikov Alexander Gourkov;Dvorsky Marian;Zhao Yonggang
分类号 G06F17/30;G06F9/50 主分类号 G06F17/30
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A computer-implemented method performed by a data processing apparatus, the method comprising: responsive to receiving a request to perform a shuffle operation on a data stream, the shuffle operation being an operation that groups keyed records in the data stream by key, the request including a set of initial key ranges, each initial key range corresponding to a portion of the data stream: generating a shuffler configuration that assigns a shuffler from a set of shufflers to each of the initial key ranges, each shuffler configured to receive a portion of the data stream associated with an assigned key range from one or more writers and provide the portion of the data stream to one or more readers;initiating the set of shufflers to perform the shuffle operation on the data stream;analyzing metadata statistics to identify a shuffler configuration update event, the metadata statistics produced by the set of shufflers during the shuffle operation and indicating load statistics for each shuffler in the set of shufflers; andupon identification of the shuffler configuration update event and during the shuffle operation, altering the shuffler configuration based at least in part on the metadata statistics to produce an assignment of shufflers to key ranges that is different from the assignment of shufflers to the initial key ranges.
地址 Mountain View CA US