发明名称 Tracking tuples to reduce redundancy in a graph
摘要 A stream of tuples can assigned identifiers to tuples to store only the nonduplicative tuples. In a streams processing environment, a stream application actor like an operator can receive a series of tuples, process them, and output another series of tuples. Each of the tuples can be assigned a tuple identifier. The tuple identifier can tag the tuple as associated with the operator. Another operator can receive the tuples, identify the duplicative tuples, and store only the nonduplicative tuples.
申请公布号 US9619518(B2) 申请公布日期 2017.04.11
申请号 US201615086580 申请日期 2016.03.31
申请人 International Business Machines Corporation 发明人 Branson Michael J.;Santosuosso John M.
分类号 G06F17/30;G06F7/00 主分类号 G06F17/30
代理机构 代理人 Gisler Laura E.
主权项 1. A computer implemented method for processing a stream of tuples, wherein the stream of tuples are to be processed by a plurality of processing elements operating on one or more computer processors, each processing element having one or more stream operators, wherein one or more of the stream operators include code configured to output tuples to one or more other stream operators, and wherein the plurality of processing elements are arranged in a linear execution path in an operator graph, the method comprising: receiving a first series of tuples at a first processing element, the first processing element configured to perform one or more operations on the first series of tuples and to output a second series of tuples, wherein the one or more operations include analysis logic; assigning, to each tuple in the second series of tuples, a tuple identifier, the tuple identifier associating the first processing element to each tuple in the second series of tuples and wherein the tuple identifier comprises identification information from the first processing element and identification information relating each tuple in the second series of tuples to one or more of its sibling tuples; receiving, by a second processing element, the second series of tuples, the second processing element comprising a windowing operator, the windowing operator having a window memory, and wherein the first and second processing elements run on the same memory and are located on the same compute node; identifying, based on the assigned tuple identifiers and a set of parameters, a set of duplicative tuples in the second series of tuples; the set of duplicative tuples comprising one or more tuples that are duplicative in view of tuples outside the set of duplicative tuples and in the second series of tuples, and wherein the set of duplicative tuples are identified using the set of parameters, wherein the set of parameters define sibling tuples as duplicative; processing, in response to the identifying and by the second processing element, the second series of tuples, wherein the processing comprises adding metadata to each tuple in the second series of tuples, wherein the metadata associates each tuple with the processing and its sibling tuples; and storing, in the window memory of the second processing element, a set of nonduplicative tuples for later processing, the window memory of the second processing element configured to store tuples, the set of nonduplicative tuples comprising tuples remaining in the second series of tuples after a removal of the set of duplicative tuples therefrom.
地址 Armonk NY US