发明名称 Processing of data using a database system in communication with a data processing framework
摘要 A system, method, and computer program product for processing data are disclosed. The system includes a data processing framework configured to receive a data processing task for processing, a plurality of database systems coupled to the data processing framework, and a storage component in communication with the data processing framework and the plurality database systems. The database systems perform a data processing task. The data processing task is partitioned into a plurality of partitions and each database system processes a partition of the data processing task assigned for processing to that database system. Each database system performs processing of its assigned partition of the data processing task in parallel with another database system processing another partition of the data processing task assigned to the another database system. The data processing framework performs at least one partition of the data processing task.
申请公布号 US9495427(B2) 申请公布日期 2016.11.15
申请号 US201113032516 申请日期 2011.02.22
申请人 Yale University 发明人 Abadi Daniel;Bajda-Pawlikowski Kamil;Abouzied Azza;Silberschatz Avi
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Mintz Levin Cohn Ferris Glovsky and Popeo, P.C. 代理人 Mintz Levin Cohn Ferris Glovsky and Popeo, P.C.
主权项 1. A data processing system, comprising: a data processing framework having at least one processor and configured to receive a data processing task for processing; a plurality of database systems having at least another processor, the plurality of database systems being distinct from and are coupled to the data processing framework, wherein the database systems are configured to perform a data processing task, the at least one processor of the data processing framework is distinct from the at least another processor of the plurality of database systems; wherein the data processing task is configured to be partitioned into a plurality of partitions; a distributed file system in communication with the data processing framework and the plurality of database systems, and being distinct from each of the data processing framework and the plurality of database systems, the distributed file system optionally stores at least one output data associated with at least one partition of the data processing task being processed by the data processing framework; each database system in the plurality of database systems is configured to process a partition of the data processing task assigned for processing to that database system by the data processing framework based on at least a processing capacity of that database system, wherein the processing capacity is determined based on at least one of the following: whether that database system is still processing a previously assigned partition of the data processing task, and having each database system determine and provide an indication of its processing capacity to the data processing framework while the data processing task is being processed; each database system in the plurality of database systems is configured to perform processing of its assigned partition of the data processing task in parallel with another database system in the plurality of database systems processing another partition of the data processing task assigned to the another database system; wherein the data processing framework is configured to process the at least one-partition of the data processing task; a storage component in communication with the data processing framework and the plurality database systems, configured to store information about each partition of the data processing task being processed by each database system in the plurality of database systems and the data processing framework, wherein the storage component stores at least one connection parameter specific to each database system and information about at least one data partition property of data stored in at least one database system, wherein, using the at least one connection parameter and the at least one data partition property, processing of each partition of the data processing task is optimized in accordance with at least one requirement of each database system; and a database connector component configured to provide a communication interface between the plurality of database systems and the data processing framework.
地址 New Haven CT US