发明名称 Determining and validating provenance data in data stream processing system
摘要 Techniques are disclosed for determining and validating provenance data in such data stream processing systems. For example, a method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, comprises the following steps. Input data elements and output data elements associated with at least one processing element of the plurality of processing elements are obtained. One or more intervals are computed for the processing element using data representing observations of associations between inputs elements and output elements of the processing element, wherein, for a given one of the intervals, one or more particular input elements contained within the given interval are determined to have contributed to a particular output element. In another method, intervals are specified, and then validated by comparing the specified intervals against intervals computed based on observations.
申请公布号 US8775344(B2) 申请公布日期 2014.07.08
申请号 US200812125219 申请日期 2008.05.22
申请人 International Business Machines Corporation 发明人 Blount Marion Lee;Davis, II John Sidney;Ebling Maria Rene;Misra Archan;Sow Daby Mousse;Wang Min
分类号 G06F17/00;G06N3/00 主分类号 G06F17/00
代理机构 Ryan, Mason & Lewis, LLP 代理人 Young Preston J.;Ryan, Mason & Lewis, LLP
主权项 1. A method for processing data associated with a data stream received by a data stream processing system, wherein the system comprises a plurality of processing elements, the method comprising the steps of: obtaining a data stream of input elements and a data stream of output elements associated with at least one processing element of the plurality of processing elements, wherein the data stream of input elements are obtained from at least one streaming data source, and wherein the data stream of output elements are generated by the at least one processing element in response to the data stream of input elements; computing one or more intervals for the at least one processing element, wherein the one or more intervals are computed using data representing observations of associations between the input elements and the output elements of the at least one processing element, wherein, for a given one of the computed intervals, one or more particular input elements contained within the given computed interval are determined to have contributed to a particular output element; and using the computed one or more intervals to determine a dependency function that enables a provenance of the particular output element to be determined in terms of the one or more particular input elements.
地址 Armonk NY US