发明名称 Meta-data driven data ingestion using MapReduce framework
摘要 A generic approach for automatically ingesting data into an HDFS (Hadoop File System) based data warehouse includes a datahub server, a generic pipelined data loading framework, and a meta-data model that, together, address data loading efficiency, data source heterogeneities, and data warehouse schema evolvement. The loading efficiency is achieved via the MapReduce scale-out solution. The meta-data model is comprised of configuration files and a catalog. The configuration file is setup per ingestion task. The catalog manages the data warehouse schema. When a scheduled data loading task is executed, the configuration files and the catalog collaboratively drive the datahub server to load the heterogeneous data to their destination schemas automatically.
申请公布号 US8949175(B2) 申请公布日期 2015.02.03
申请号 US201213466981 申请日期 2012.05.08
申请人 Turn Inc. 发明人 Wu Mingxi;Chen Songting
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Kwan & Olynick LLP 代理人 Kwan & Olynick LLP
主权项 1. A method for automatically ingesting data into a data warehouse, comprising: providing a datahub server for executing data loading tasks; providing a generic pipelined data loading framework that leverages a MapReduce environment for ingestion of a plurality of heterogeneous data sources; and providing a processor implemented meta-data model comprised of a plurality of configuration files and a catalog; wherein a configuration file is setup per ingestion task; wherein said catalog manages data warehouse schema; wherein when a scheduled data loading task is executed by said datahub server; and wherein said configuration files and said catalog collaboratively drive the datahub server to load the heterogeneous data to their destination schemas automatically and independently of data source heterogeneities and data warehouse schema evolvement.
地址 Redwood City CA US