主权项 |
1. A system for performing queries on stored data in a distributed computing cluster of a plurality of data nodes, comprising:
a query engine for each data node, having:
a query planner that parses a query from a client to create query fragments based on a schema specifying one or more formats in which data is stored on the data nodes,
wherein, when data in a target format is stored, the query fragments are created for the target format, and when data in the target format is not stored, the query fragments are created for another format;a query coordinator that distributes the query fragments among the plurality of data nodes; anda query execution engine comprising:
a transformation module that transforms the data in the format for which the query fragments are created based on the schema; andan execution module that executes the query fragments on the transformed data to obtain intermediate results that are aggregated and returned to the client. |