发明名称 Distributed data stream processing method and system
摘要 Embodiments of the present application relate to a distributed data stream processing method, a distributed data stream processing device, a computer program product for processing a raw data stream and a distributed data stream processing system. A distributed data stream processing method is provided. The method includes dividing a raw data stream into a real-time data stream and historical data streams, processing the real-time data stream and the historical data streams in parallel, separately generating respective results of the processing of the real-time data stream and the historical data streams, and integrating the generated processing results.
申请公布号 US9250963(B2) 申请公布日期 2016.02.02
申请号 US201213681271 申请日期 2012.11.19
申请人 Alibaba Group Holding Limited 发明人 Zhang Xu;Yang Zhixiong;Xu Jia;Deng Zhonghua
分类号 G06F9/46;G06F9/50 主分类号 G06F9/46
代理机构 Van Pelt, Yi & James LLP 代理人 Van Pelt, Yi & James LLP
主权项 1. A distributed data stream processing method, the method comprising: determining a number of a plurality of division modules based on flow volume of a raw data stream; dividing the raw data stream into a real-time data stream and historical data streams based on the plurality of division modules; processing the real-time data stream and the historical data streams in parallel, wherein one data block of the real-time data stream is processed in parallel with another data block of the real-time data stream, and wherein the processing of the real-time data stream comprises: dividing the real-time data stream into a plurality of data blocks based on a first dimension;dividing each data block into a plurality of data sub-blocks based on a second dimension;determining a number of a plurality of functional modules within a functional module group and a number of a plurality of functional module groups based on a number of the plurality of data sub-blocks and resources available to be used to process the plurality of data sub-blocks;processing the plurality of data sub-blocks in parallel, wherein one data sub-block is sent to a first functional module of the plurality of functional modules to be processed and another data sub-block is sent to a second functional module of the plurality of functional modules to be processed, wherein the processing the plurality of data sub-blocks in parallel comprises: transmitting a first data sub-block and a second data sub-block to the first functional module of a first functional group, the first and second data sub-blocks relating to a first user; andtransmitting a third data sub-block and a fourth data sub-block to the second functional module of the first functional group, the third and fourth data sub-blocks relating to a second user; andaggregating the results of the processing of the plurality of data sub-blocks; separately generating respective results of the processing of the real-time data stream and the historical data streams; and integrating the respective generated processing results.
地址 KY