发明名称 BACKGROUND FORMAT OPTIMIZATION FOR ENHANCED SQL-LIKE QUERIES IN HADOOP
摘要 A format conversion engine for Apache Hadoop that converts data from its original format to a database-like format at certain time points for use by a low latency (LL) query engine. The format conversion engine comprises a daemon that is installed on each data node in a Hadoop cluster. The daemon comprises a scheduler and a converter. The scheduler determines when to perform the format conversion and notifies the converter when the time comes. The converter converts data on the data node from its original format to a database-like format for use by the low latency (LL) query engine.
申请公布号 US2015095308(A1) 申请公布日期 2015.04.02
申请号 US201314043753 申请日期 2013.10.01
申请人 Cloudera, Inc. 发明人 Kornacker Marcel;Erickson Justin;Li Nong;Kuff Lenni;Robinson Henry Noel;Choi Alan;Behm Alex
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A system for performing queries on stored data in a distributed computing cluster of a plurality of data nodes, comprising: a query engine for each data node, having: a query planner that parses a query from a client to create query fragments based on a schema specifying one or more formats in which data is stored on the data nodes, wherein, when data in a target format is stored, the query fragments are created for the target format, and when data in the target format is not stored, the query fragments are created for another format;a query coordinator that distributes the query fragments among the plurality of data nodes; anda query execution engine comprising: a transformation module that transforms the data in the format for which the query fragments are created based on the schema; andan execution module that executes the query fragments on the transformed data to obtain intermediate results that are aggregated and returned to the client.
地址 Palo Alto CA US