发明名称 Low latency query engine for Apache Hadoop
摘要 A low latency query engine for APACHE HADOOP™ that provides real-time or near real-time, ad hoc query capability, while completing batch-processing of MapReduce. In one embodiment, the low latency query engine comprises a daemon that is installed on data nodes in a HADOOP™ cluster for handling query requests and all internal requests related to query execution. In a further embodiment, the low latency query engine comprises a daemon for providing name service and metadata distribution. The low latency query engine receives a query request via client, turns the request into collections of plan fragments and coordinates parallel and optimized execution of the plan fragments on remote daemons to generate results at a much faster speed than existing batch-oriented processing frameworks.
申请公布号 US9342557(B2) 申请公布日期 2016.05.17
申请号 US201313800280 申请日期 2013.03.13
申请人 Cloudera, Inc. 发明人 Kornacker Marcel;Erickson Justin;Li Nong;Kuff Lenni;Robinson Henry Noel;Choi Alan;Behm Alex
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Perkins Coie LLP 代理人 Perkins Coie LLP
主权项 1. A system for performing queries on stored data in a HADOOP™ distributed computing cluster having a plurality of data nodes, each data node being a computing device having processing circuitry and memory circuitry, the system comprising: a state store that tracks a status of each data node, wherein the state store is separate from the data nodes and is further coupled to a name node that tracks where file data are stored across the cluster; and a plurality of data nodes forming a peer-to-peer network for the queries, each data node functioning as a peer in the peer-to-peer network and being capable of interacting with components of the HADOOP™ cluster, each peer having an instance of a query engine running in memory, each instance of the query engine having: a query planner configured to: receive queries from clients;obtain, from the state store and the name node, (1) membership information regarding all query engine instances that are running in the cluster, and (2) location information regarding where data blocks relevant to the queries are distributed among the plurality of data nodes;parse queries from clients to create query fragments based on data obtained from the state store and the name node; andconstruct a query plan based on the data obtained from the state store;a query coordinator configured to distribute the query fragments among the plurality of data nodes according to the query plan; anda query execution engine configured to execute the query fragments, to obtain intermediate results from other data nodes that receive the query fragments, and to aggregate the intermediate results for the clients.
地址 Palo Alto CA US