发明名称 Self-described query execution in a massively parallel SQL execution engine
摘要 A query is executed in a massively parallel processing data storage system comprising a master node communicating with a cluster of multiple segments that access data in distributed storage by producing a self-described query plan at the master node that incorporates changeable metadata and information needed to execute the self-described query plan on the segments, and that incorporates references to obtain static metadata and information for functions and operators of the query plan from metadata stores on the segments. The distributed storage may be the Hadoop distributed file system, and the query plan may be a full function SQL query plan.
申请公布号 US9626411(B1) 申请公布日期 2017.04.18
申请号 US201313853060 申请日期 2013.03.29
申请人 EMC IP Holding Company LLC 发明人 Chang Lei;Wang Zhanwei;Ma Tao;Lonergan Luke;Jian Lirong;Ma Lili
分类号 G06F7/02;G06F17/30 主分类号 G06F7/02
代理机构 Van Pelt, Yi & James LLP 代理人 Van Pelt, Yi & James LLP
主权项 1. A method of query execution in a massively parallel processing (MPP) data storage system comprising a master node and a cluster of multiple distributed segments that access data in distributed storage, comprising: producing a self-described query plan at the master node that is responsive to a query for accessing data in the distributed storage to satisfy the query, said producing comprising incorporating, into a query plan at the master node, metadata and other information needed by the segments to execute the query plan to create said self-described query plan, wherein said metadata and other information comprise information as to locations of said data in said distributed storage that are accessed by said self-described query plan, and catalog information for functions and operators used in the self-described query plan for processing the data, and wherein said metadata and other information are stored in a store at said master node, wherein in the event that a part of such metadata or a part of such other information needed by the segments to execute the query plan is stored at the cluster of multiple distributed segments, the master node includes an identifier associated with the part of such metadata or the part of such other information that is stored at the cluster of multiple distributed segments and excludes the part of such metadata or the part of such other information that is stored at the cluster of multiple distributed segments from the query plan; broadcasting said self-described query plan to said segments for execution; and executing the self-described query plan to process said data.
地址 Hopkinton MA US