发明名称 SYSTEM AND METHOD FOR CONSISTENT READS BETWEEN TASKS IN A MASSIVELY PARALLEL OR DISTRIBUTED DATABASE ENVIRONMENT
摘要 A system and method is described for database split generation in a massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying functionality. A database table accessor of the system obtains, from an associated client application, a query for data in a table of the data warehouse layer, wherein the query includes a user preference. The system obtains table data representative of properties of the table, and determines a splits generator in accordance with one or more of the user preference or the properties of the table. The system generates, by the selected splits generator, table splits dividing the user query into a plurality of query splits, and outputs the plurality of query splits to an associated plurality of mappers for execution by the associated plurality of mappers of each of the plurality of query splits against the table.
申请公布号 US2016092548(A1) 申请公布日期 2016.03.31
申请号 US201514864792 申请日期 2015.09.24
申请人 ORACLE INTERNATIONAL CORPORATION 发明人 SHIVARUDRAIAH ASHOK;SWART GARRET;DE LAVARENE JEAN
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人
主权项 1. A method for providing consistent reads between query tasks in an associated massively parallel or distributed database environment including a plurality of databases and a data warehouse layer providing data summarization and querying of the plurality of databases, the method comprising: obtaining, from an associated client application, a query for data in a table of the data warehouse layer, the query comprising query data representative of a user query and user preference data representative of a user preference; obtaining, from the data warehouse layer, table data representative of one or more properties of the table; determining a splits generator in accordance with one or more of the user preference or the one or more properties of the table; obtaining, from the data warehouse layer, record current system change number (SCN) data, the SCN data being representative of an logical internal time stamp used by the plurality of databases of the associated massively parallel or distributed database environment; generating, by the selected splits generator, table splits dividing the user query into a plurality of query splits; associating the SCN data with each of the plurality of query splits; and outputting the plurality of query splits together with the SCN data to a plurality of associated mappers for execution by the plurality of mappers against the table as query tasks using the SCN data to provide consistent reads between the query tasks.
地址 Redwood Shores CA US