发明名称 Data analytics platform over parallel databases and distributed file systems
摘要 Performing data analytics processing in the context of a large scale distributed system that includes a massively parallel processing (MPP) database and a distributed storage layer is disclosed. In various embodiments, a data analytics request is received. A plan is created to generate a response to the request. A corresponding portion of the plan is assigned to each of a plurality of distributed processing segments, including by invoking as indicated in the assignment one or more data analytical functions embedded in the processing segment.
申请公布号 US9563648(B2) 申请公布日期 2017.02.07
申请号 US201313840912 申请日期 2013.03.15
申请人 EMC IP Holding Company LLC 发明人 Welton Caleb E.;Yang Shengwen
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Van Pelt, Yi & James LLP 代理人 Van Pelt, Yi & James LLP
主权项 1. A method, comprising: embedding in each of a plurality of distributed processing segments a library or other shared object comprising one or more data analytical functions; receiving by a master node a data analysis request; creating by the master node a plan to generate a response to the request; assigning to each of the plurality of distributed processing segments a corresponding portion of the plan to be performed by that segment, including by invoking as indicated in the assignment one or more data analytical functions embedded in the processing segment; obtaining, by the master node, metadata associated with one or more portions of the plan to be performed by one or more corresponding segments, wherein the master node obtains the metadata from a central metadata store wherein the metadata identifies a location data corresponding to the one or more portions of the plan and at least a part of one or more data analytic processing to be performed in connection with processing the corresponding one or more portions of the plan; sending, by the master node to each of the plurality of distributed processing segments for which a portion of the plan is assigned, the corresponding portion of the plan to be performed by that segment and the metadata, wherein the metadata is used to locate or access a subset of data on which the segment is to perform an indicated processing; receiving, from each of the plurality of distributed processing segments for which a portion of the plan is assigned, a corresponding result of processing the portion of the plan; and generating, a master response to the data analysis request based at least in part on the corresponding result of processing the portion of the plan received from each of the plurality of distributed processing segments for which a portion of the plan is assigned.
地址 Hopkinton MA US