发明名称 BEHAVIORALLY CONSISTENT CLUSTER-WIDE DATA WRANGLING BASED ON LOCALLY PROCESSED SAMPLED DATA
摘要 Example embodiments involve a system, computer-readable storage medium storing at least one program, and computer-implemented method for behaviorally consistent data wrangling. A local client device selects a set of raw sample data from a remote datastore. A local execution engine then applies one or more local data wrangling operations to the raw sample data. If the results of the local data wrangling operations are satisfactory, the local data wrangling operations may then be transferred to a remote data wrangling cluster. A remote execution engine being executed by the remote data wrangling cluster then applies the data wrangling operations to the larger set of raw data from which the sample raw data was obtained. As the remote execution engine and the local execution engine are of the same type, the data wrangling behavior exhibited by the local execution engine is reflected in the data wrangling behavior of the remote execution engine.
申请公布号 US2016188692(A1) 申请公布日期 2016.06.30
申请号 US201414588022 申请日期 2014.12.31
申请人 Tsumura Michael;Ivanov Ivailo;Kumar Viren Suresh 发明人 Tsumura Michael;Ivanov Ivailo;Kumar Viren Suresh
分类号 G06F17/30;H04L29/08 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method comprising: selecting, at a local client device, a first plurality of raw data from a second plurality of raw data, the second plurality of raw data being stored remote from the local client device; receiving the first plurality of raw data at the local client device; selecting, at the local client device, a plurality of data wrangling operations to perform on the first plurality of raw data; applying, at the local device, the plurality of data wrangling operations to the first plurality of raw data to obtain a first plurality of structured data; and sending the selection of the plurality of data wrangling operations to a remote device, the remote device being configured to apply the selected plurality of data wrangling operations to the second plurality of raw data to obtain a second plurality of structured data, the second plurality of structured data having an expected organization based on the first plurality of structured data.
地址 Richmond CA