发明名称 Flow analysis instrumentation
摘要 Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for flow analysis. In one aspect, a method includes modifying a dataflow graph, the dataflow graph including a plurality of paths connecting at least one entry point and at least one exit point, including adding components to the dataflow graph that add flow units to data records and remove flow units from data records, each flow unit identifying a segment of a path traversed by the data record. The method also includes identifying execution paths based on flow units obtained by processing a plurality of data records using the modified dataflow graph. The method also includes determining a subset of the plurality of data records, wherein a selected set of execution paths are represented by the subset.
申请公布号 US9563411(B2) 申请公布日期 2017.02.07
申请号 US201213344155 申请日期 2012.01.05
申请人 Ab Initio Technology LLC 发明人 Roberts Andrew F.
分类号 G06F9/45;G06F11/36 主分类号 G06F9/45
代理机构 Fish & Richardson P.C. 代理人 Fish & Richardson P.C.
主权项 1. A computer-implemented method including: modifying a dataflow graph, the dataflow graph including a plurality of paths connecting at least one entry point and at least one exit point, including: adding components to the dataflow graph that add flow units to data records and remove flow units from data records, each flow unit tagging a specified data record with information identifying (i) a segment of a path through the dataflow graph traversed by the specified data record, and (ii) one or more other data records upon which the specified data record depends, when the specified data record is dependent on one or more other data records; for a data record processed using the modified dataflow graph, generating, based on one or more flow units tagging the data record, a record lineage that specifies (i) which one of the plurality of paths of the dataflow graph is traversed by the data record, and (ii) one or more other data records upon which the processed data record depends, when the data record is dependent on one or more other data records; based on record lineages generated, identifying execution paths of the data records through the modified dataflow graph including the plurality of paths connecting the at least one entry point and the at least one exit point, wherein a first one of the execution paths through the modified dataflow graph traversed by a first one of the data records is distinct from a second one of the execution paths through the modified dataflow graph traversed by a second one of the data records; and based on a selected set of the execution paths through the modified dataflow graph including the plurality of paths connecting the at least one entry point and the at least one exit point, determining a subset of the plurality of data records having traversed that selected set of the execution paths.
地址 Lexington MA US