发明名称 Accelerating data profiling process
摘要 A data profile request is handles by utilizing data in a distributed file system. Tabular data is extracted from a data source and stored in a distributed file system. Each table in the tabular data is split by columns, which are each stored in separate files in a set of physical nodes of the distributed file system. In response to a data profiling request, a master node determines, based on the profiling request, which groups of files are needed to be on a same physical node in order to perform the profiling analysis. The master node creates jobs using physical nodes that contain the requisite files needed for each job.
申请公布号 US8719271(B2) 申请公布日期 2014.05.06
申请号 US201213645730 申请日期 2012.10.05
申请人 INTERNATIONAL BUSINESS MACHINES CORPORATION 发明人 NELKE SEBASTIAN;OBERHOFER MARTIN;SAILLET YANNICK;SEIFERT JENS
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项
地址