发明名称 |
Processing spatial joins using a mapreduce framework |
摘要 |
Techniques, systems, and articles of manufacture for processing spatial joins using a MapReduce framework. A method includes partitioning a spatial data domain based on a distribution of spatial data objects across multiple nodes of a cluster of machines, defining at least one operation to be performed on the partitioned spatial data domain based on one or more predicates of a query, and executing the at least one defined operation on the partitioned spatial data domain to determine a response to the query. |
申请公布号 |
US9311380(B2) |
申请公布日期 |
2016.04.12 |
申请号 |
US201313853451 |
申请日期 |
2013.03.29 |
申请人 |
International Business Machines Corporation |
发明人 |
Chawda Bhupesh S.;Gupta Himanshu;Faruquie Tanveer A;Subramaniam L. Venkata |
分类号 |
G06F17/30 |
主分类号 |
G06F17/30 |
代理机构 |
Ryan, Mason & Lewis, LLP |
代理人 |
Ryan, Mason & Lewis, LLP |
主权项 |
1. A method comprising:
partitioning a spatial data domain into multiple portions of partitioned spatial data via a MapReduce framework based on a distribution of spatial data objects across multiple nodes of a cluster of machines; defining at least one operation to be performed on each of the multiple portions of the partitioned spatial data domain based on one or more spatial predicates of a query, wherein:
said at least one operation is selected from a group consisting of (i) a project operation that determines a partition in which the start point of a given spatial data object resides, (ii) a split operation that determines all partitions that share at least one point of a given spatial data object, and (iii) a replication operation that determines all partitions that satisfy a given condition; andsaid one or more spatial predicates are selected from a group consisting of (i) an overlap parameter that indicates that two or more portions of the spatial data each possess at least one identical value, (ii) a range parameter that indicates that any point in a first portion of the spatial data is within a given distance of any point in a second portion of the spatial data, and (iii) a nearest neighbor parameter that indicates that a first portion of the spatial data is nearer to a second portion of the spatial data than any other portion of the spatial data; and executing the at least one defined operation on each of the multiple portions of the partitioned spatial data domain to determine a response to the query, wherein each of the multiple portions of the partitioned spatial data is processed exclusively by a distinct map task within the MapReduce framework; wherein said partitioning, said defining, and said executing are carried out by a computer device. |
地址 |
Armonk NY US |