发明名称 Processing spatial joins using a mapreduce framework
摘要 Techniques, systems, and articles of manufacture for processing spatial joins using a MapReduce framework. A method includes partitioning a spatial data domain based on a distribution of spatial data objects across multiple nodes of a cluster of machines, defining at least one operation to be performed on the partitioned spatial data domain based on one or more predicates of a query, and executing the at least one defined operation on the partitioned spatial data domain to determine a response to the query.
申请公布号 US9311380(B2) 申请公布日期 2016.04.12
申请号 US201313853451 申请日期 2013.03.29
申请人 International Business Machines Corporation 发明人 Chawda Bhupesh S.;Gupta Himanshu;Faruquie Tanveer A;Subramaniam L. Venkata
分类号 G06F17/30 主分类号 G06F17/30
代理机构 Ryan, Mason & Lewis, LLP 代理人 Ryan, Mason & Lewis, LLP
主权项 1. A method comprising: partitioning a spatial data domain into multiple portions of partitioned spatial data via a MapReduce framework based on a distribution of spatial data objects across multiple nodes of a cluster of machines; defining at least one operation to be performed on each of the multiple portions of the partitioned spatial data domain based on one or more spatial predicates of a query, wherein: said at least one operation is selected from a group consisting of (i) a project operation that determines a partition in which the start point of a given spatial data object resides, (ii) a split operation that determines all partitions that share at least one point of a given spatial data object, and (iii) a replication operation that determines all partitions that satisfy a given condition; andsaid one or more spatial predicates are selected from a group consisting of (i) an overlap parameter that indicates that two or more portions of the spatial data each possess at least one identical value, (ii) a range parameter that indicates that any point in a first portion of the spatial data is within a given distance of any point in a second portion of the spatial data, and (iii) a nearest neighbor parameter that indicates that a first portion of the spatial data is nearer to a second portion of the spatial data than any other portion of the spatial data; and executing the at least one defined operation on each of the multiple portions of the partitioned spatial data domain to determine a response to the query, wherein each of the multiple portions of the partitioned spatial data is processed exclusively by a distinct map task within the MapReduce framework; wherein said partitioning, said defining, and said executing are carried out by a computer device.
地址 Armonk NY US