发明名称 Partition-based index management in hadoop-like data stores
摘要 A method for processing a dataset in a partitioned distributed storage system having data stored in a base table and an index stored in an index table, may include receiving base and index table metadata from the partitioned distributed storage system, where the base and index table metadata includes respective table partition information. The method may further include partitioning the dataset into a set of base-delta files according to the base table metadata, and generating a set of index-delta files corresponding with the base-delta files according to the index table metadata. The method may additionally include updating the partitioned distributed storage system with the set of base-delta and the set of index-delta files, where a first update of the base table is synchronous with a second update of the index table.
申请公布号 US9460147(B1) 申请公布日期 2016.10.04
申请号 US201614993166 申请日期 2016.01.12
申请人 International Business Machines Corporation 发明人 Chang Yuan-Chi;Fong Liana L.;Tan Wei
分类号 G06F17/30 主分类号 G06F17/30
代理机构 代理人 Edwards Mark G.;Garg Nidhi
主权项 1. A method for maintaining an index into a dataset after a batch update of the dataset of a partitioned distributed storage system, the dataset stored in an HBase database having data stored in a base table and an index stored in an index table, the method comprising: locking the base and index tables to prevent region split, merge and movement operations; receiving base and index table metadata from the partitioned distributed storage system, wherein the base and index table metadata includes respective table partition information; partitioning the dataset into a set of base-delta files according to the base table metadata and a first criteria; updating the partitioned distributed storage system a first time with the set of base-delta files; generating a set of index-delta files corresponding with the base-delta files by: determining a second criteria for generating keys for indexing the partitioned dataset,generating, based on the second criteria and partition information about the partitioned dataset, a set of index-delta files having keys for indexing the partitioned dataset; updating the partitioned distributed storage system a second time with the set of index-delta files, wherein a first update of the base table is synchronous with a second update of the index table; and unlocking, subsequent to the second update, the base and index tables;wherein, updating includes deleting the base-delta and index-delta files from a respective one or more computing systems having the base and index tables when the batch update includes a delete operation, and copying base-delta and index-delta files from the respective one or more computing systems having the base and index tables when the batch update includes a load operation.
地址 Armonk NY US