发明名称 |
Distributed data stream processing method and system |
摘要 |
Embodiments of the present application relate to a distributed data stream processing method, a distributed data stream processing device, a computer program product for processing a raw data stream and a distributed data stream processing system. A distributed data stream processing method is provided. The method includes dividing a raw data stream into a real-time data stream and historical data streams, processing the real-time data stream and the historical data streams in parallel, separately generating respective results of the processing of the real-time data stream and the historical data streams, and integrating the generated processing results. |
申请公布号 |
US9250963(B2) |
申请公布日期 |
2016.02.02 |
申请号 |
US201213681271 |
申请日期 |
2012.11.19 |
申请人 |
Alibaba Group Holding Limited |
发明人 |
Zhang Xu;Yang Zhixiong;Xu Jia;Deng Zhonghua |
分类号 |
G06F9/46;G06F9/50 |
主分类号 |
G06F9/46 |
代理机构 |
Van Pelt, Yi & James LLP |
代理人 |
Van Pelt, Yi & James LLP |
主权项 |
1. A distributed data stream processing method, the method comprising:
determining a number of a plurality of division modules based on flow volume of a raw data stream; dividing the raw data stream into a real-time data stream and historical data streams based on the plurality of division modules; processing the real-time data stream and the historical data streams in parallel, wherein one data block of the real-time data stream is processed in parallel with another data block of the real-time data stream, and wherein the processing of the real-time data stream comprises:
dividing the real-time data stream into a plurality of data blocks based on a first dimension;dividing each data block into a plurality of data sub-blocks based on a second dimension;determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data sub-blocks and resources available to be used to process the plurality of data sub-blocks;processing the plurality of data sub-blocks in parallel, wherein one data sub-block is sent to a first functional module of the plurality of functional modules to be processed and another data sub-block is sent to a second functional module of the plurality of functional modules to be processed, wherein the processing the plurality of data sub-blocks in parallel comprises:
transmitting a first data sub-block and a second data sub-block to the first functional module of a first functional group, the first and second data sub-blocks relating to a first user; andtransmitting a third data sub-block and a fourth data sub-block to the second functional module of the first functional group, the third and fourth data sub-blocks relating to a second user; andaggregating the results of the processing of the plurality of data sub-blocks; separately generating respective results of the processing of the real-time data stream and the historical data streams; and integrating the respective generated processing results. |
地址 |
KY |