发明名称 History preserving data pipeline system and method
摘要 A history preserving data pipeline computer system and method. In one aspect, the history preserving data pipeline system provides immutable and versioned datasets. Because datasets are immutable and versioned, the system makes it possible to determine the data in a dataset at a point in time in the past, even if that data is no longer in the current version of the dataset.
申请公布号 US9229952(B1) 申请公布日期 2016.01.05
申请号 US201414533433 申请日期 2014.11.05
申请人 Palantir Technologies, Inc. 发明人 Meacham Jacob;Harris Michael;Brodman Gustav;Cuthriell Lynn;Korus Hannah;Toth Brian;Hsiao Jonathan;Elliot Mark;Schimpf Brian;Garland Michael;Nguyen Evelyn
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Hickman Palermo Becker Bingham LLP 代理人 Hickman Palermo Becker Bingham LLP ;Stone Adam C.
主权项 1. A method for preserving history of a derived dataset, the method comprising: at one or more computing devices comprising one or more processors and storage media storing one or more computer programs executed by the one or more processors to perform the method, perform operations of: storing a first version of a derived dataset;wherein the first version of the derived dataset is derived from at least a first version of another dataset by executing a first version of derivation program associated with the derived dataset;storing a first build catalog entry, the first build catalog entry associated with the derived dataset and comprising an identifier of the first version of the other dataset and comprising an identifier of the first version of the derivation program;wherein the first build catalog entry comprises a name of the derived dataset and an identifier of the first version of the derived dataset;updating the other dataset to produce a second version of the other dataset;storing a second version of the derived dataset;wherein the second version of the derived dataset is derived from at least the second version of the other dataset by executing the first version of the derivation program associated with the derived dataset;storing a second build catalog entry, the second build catalog entry associated with the derived dataset and comprising an identifier of the second version of the other dataset and comprising an identifier of the first version of the derivation program; andwherein the second build catalog entry comprises the name of the derived dataset and an identifier of the second version of the derived dataset.
地址 Palo Alto CA US