主权项 |
1. A system for processing data sets in real-time by using a distributed network to generate and process partitioned streams, the system comprising:
a message allocator that:
receives a plurality of data sets from one or more producer devices;for each of the plurality of data set:
identifies a tag or characteristic of the data set;identifies an initial partition stream from amongst a plurality of initial partition streams that corresponds to the tag or characteristic; andappends the data set to the initial partition stream, such that the data set is associated with a rank that is higher than other ranks associated with other data sets in the initial partition stream; a partition controller that, for an initial partition stream of the plurality of initial partition streams, manages a set of task processors such that:
each task processor in the set of task processors is designated to perform a task in a workflow so as to process data sets in the initial partition stream or a processed versions of the data sets in a processed version of the initial partition stream in an order that corresponds to the ranks, wherein the set of task processors includes:
a first task processor designated to perform a first task;a second task processor designated to perform a second task; anda third task processor designated to perform a third task;the first task processor in the set of task processors is controlled so as to:
generate, via performance of the first task, processed data sets corresponding to data sets in the initial partition stream;facilitate storing the processed versions of the data sets at a first data store;generate a processed partition stream that includes the processed versions of data sets in the initial partition stream; andfacilitate routing the processed partition stream for further processing of the processed data sets in accordance with one or more other tasks;the second task processor in the set of task processors is controlled so as to:
generate, via performance of the second task, a score corresponding to each data set in the initial partition stream; andfacilitate storing the scores at a second data store; andthe third task processor in the set of processors is controlled so as to repeatedly:
retrieve a plurality of scores from the second data store, each score in the plurality of scores;generate, via performance of the third task, a real-time analytic variable based on the plurality of scores; andfacilitate availing the real-time analytic variable to a client device,wherein repeated retrieval of the plurality of scores and repeated generation of the real-time analytic variable enables the real-time analytic variable to be updated in response to appending and task-performance processing of new data appended to the initial partition stream. |