发明名称 Data warehouse compatibility
摘要 A compatibility processing module, for executing one or more processes to format and manipulate data, such that communication between previously-incompatible data warehouses is facilitated. In particular, a first warehouse is disclosed, wherein the first data warehouse is configured with a compatibility processing module, for receiving a large number of data points, and for executing one or more processes on a stored portion of the received data points such that the resulting processed data points are compatible with formatting conventions of a second data warehouse.
申请公布号 US9460188(B2) 申请公布日期 2016.10.04
申请号 US201313908749 申请日期 2013.06.03
申请人 Bank of America Corporation 发明人 Mundlapudi Bharath;Banala Karthik;Koneru Rajesh
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Banner & Witcoff, Ltd. 代理人 Banner & Witcoff, Ltd. ;Springs Michael A.
主权项 1. A system for establishing compatibility between an open-source data warehouse and a proprietary data warehouse, the system comprising: a distribution processing module which receives a set of unprocessed, raw data points; a distributed file system which stores the set of data points according to a first data format; a first data warehouse which executes an extract, transform, and load (ETL) operation on the set of data points, the ETL operation comprising an extract process which selects from the set of data points a subset of data points for transformation and loading into a second data warehouse that uses a second data format; and a workflow scheduler which schedules execution of the ETL operation according to a workflow implemented as a directed acyclic graph comprising a plurality of action nodes that each specify an action to perform for the ETL operation wherein the plurality of action nodes comprise at least one action node that specifies a map and reduce process to perform on the subset of data points; wherein the ETL operation further comprises one or more transformation processes, each transformation process comprising transforming individual data points in the subset of data points to obtain a set of transformed data points having the second data format; wherein the one or more transformation processes comprise a group rank process, the group rank process comprising ranking a data subset that comprises a plurality of data points from the set of data points, the ranking based on a plurality of metrics, and the group rank process further comprising storing the data subset in a ranked data table comprising a first column corresponding to a first metric of the plurality of metrics, a second column corresponding to a second metric of the plurality of metrics, and a third column corresponding to a rank; and wherein the ETL operation further comprises a load process comprising loading the set of transformed data points into the second data warehouse.
地址 Charlotte NC US