发明名称 Horizontal interval-based data partitioning and indexing for large clusters
摘要 Current data records having a start and end time are transformed into a 2D space having a first dimension for each data record's start time and a second dimension for each data record's end time. Historical queries specifying data ranges are obtained. A response was previously sent for each historical query and specifying a sub-portion of data records that overlap with such historical query's specified data range. Partitioning schemes for the current data records in the 2D space are generated. An optimum partitioning scheme having a lowest cost is selected based on costs of executing the historical queries with respect to each of the partitioning schemes. The optimum partitioning scheme is applied on the current data records, including newly received data records, in the 2D space so that any subsequently received queries are applied against the current data records as partitioned by the optimum partitioning scheme in the 2D space.
申请公布号 US8903803(B1) 申请公布日期 2014.12.02
申请号 US201414308517 申请日期 2014.06.18
申请人 Turn Inc. 发明人 Aly Ahmed Moustafa Hussein;Elmeleegy Hazem;Qi Yan
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Kwan & Olynick LLP 代理人 Kwan & Olynick LLP
主权项 1. A method for partitioning a set of data records having a plurality of ranges to which a plurality of queries for specific ranges can be applied, the method comprising: at least one or more processors, transforming a plurality of current data records, which each has a start time and an end time, into a two dimensional (2D) space having a first dimension upon which each data record's start time is mapped and a second dimension upon which each data record's end time is mapped; at the at least one or more processors, obtaining a plurality of historical queries that each specifies a data range, a response being previously sent for each historical query and specifying a sub-portion of the current data records that overlap with such historical query's specified data range; at the at least one or more processors, generating a plurality of partitioning schemes for the current data records in the 2D space; at the at least one or more processors, selecting an optimum one of the partitioning schemes based on a cost of executing the historical queries with respect to each of the partitioning schemes, wherein the optimum partitioning scheme has a lowest cost of the partitioning schemes; and at the at least one or more processors, applying the optimum partitioning scheme on the current data records, including newly received data records, in the 2D space so that any subsequently received queries are applied against the current data records as partitioned by the optimum partitioning scheme in the 2D space.
地址 Redwood City CA US