发明名称 Distributed storage of aggregated data
摘要 Techniques are described for managing aggregation of data in a distributed manner, such as for a particular client based on specified configuration information. The described techniques may include storing aggregated data values for an OLAP cube or other data structure in a distributed manner, such as in some situations in a distributed hash table. The aggregated data values to be stored may be generated in various manners, such as by performing multi-stage data manipulation operations—for example, a map-reduce architecture may be used, with a first stage involving the use of one or more specified map functions to be performed, and with at least a second stage involving the use of one or more specified reduce functions to be performed.
申请公布号 US8938416(B1) 申请公布日期 2015.01.20
申请号 US201213350653 申请日期 2012.01.13
申请人 Amazon Technologies, Inc. 发明人 Cole Richard J.;Mock Alan D.
分类号 G06F17/00;G06F17/30 主分类号 G06F17/00
代理机构 Seed IP Law Group PLLC 代理人 Seed IP Law Group PLLC
主权项 1. A computer-implemented method comprising: generating, by one or more configured computing nodes of a data aggregation service, an OLAP (“online analytical processing”) cube having a plurality of aggregated data values, wherein each of the aggregated data values is associated with a combination of multiple dimension category values for multiple dimensions of the OLAP cube, and wherein the generating of the OLAP cube is performed using configuration information specified by a client of the data aggregation service and includes performing multiple stages having distinct indicated activities; and after the generating of the OLAP cube, initiating, by the one or more configured computing nodes, storing the plurality of aggregated data values for the OLAP cube in a distributed hash table having a plurality of storage locations on multiple storage nodes that are each available to store information for a key-value pair by, for each of the plurality of aggregated data values: determining a hash key value associated with the aggregated data value based at least in part on the combination of multiple dimension category values associated with the aggregated data value; determining a storage location within the distributed hash table by using the determined hash key value as input to a hash function, the determined storage location being one of the plurality of storage locations and being on one of the multiple storage nodes; and providing the aggregated data value for storage in the determined storage location on the one storage node as part of the OLAP cube.
地址 Reno NV US