Spatial Join Algorithms - PowerPoint PPT Presentation

1 / 27
About This Presentation

Spatial Join Algorithms


However, adaptation of these algorithms to deal with the spatial properties can be applied. ... Properties. 13. Spatial Hash Join Designer's Choice ... – PowerPoint PPT presentation

Number of Views:492
Avg rating:3.0/5.0
Slides: 28
Provided by: mok94


Transcript and Presenter's Notes

Title: Spatial Join Algorithms

Spatial Join Algorithms
  • Mohamed F. Mokbel

Why we need a Spatial Join ?
  • Spatial data are commonly found in applications
    like cartography, CAD, and GIS
  • In spatial join, two spatial relations are
    combined together based on some spatial criteria.
  • Examples of spatial join
  • Only spatial join is required
  • Find all forests which are in a city
  • Find all cities that are crossed by river
  • Find all cities that are affected by the fire
  • Find all buildings that overlap with a park.
  • Spatial join with a selection criteria
  • Find all government-owned buildings that overlap
    with a park.
  • Find all forests in USA that receive more than 20
    inches of average rainfall by year.
  • Find all day cares in lafayette that are within 3
    miles from houses with rent less than 600.

Why not using Relational Join Algorithms ?
  • Nested Loop Join
  • Every object of one relation has to be checked
    against all objects of the other relation. Since
    we consider a very large relations of spatial
    objects, the performance of the nested loop is
    not acceptable
  • Hash-Based Join
  • Hash-based joins are suitable for equi-joins but
    not for spatial joins.
  • Sort-Merge Join
  • There is no total ordering of spatial objects.
  • However, adaptation of these algorithms to deal
    with the spatial properties can be applied.

Is the Spatial Join Problem is Already Solved in
Another Domain?
  • Geometric domain
  • The abstraction of the spatial join is finding
    the intersection between two sets of geometric
  • Many solutions are provided in the context of
  • Geometric solutions considers only CPU cost.
  • Geometric solutions are accepted if the data set
    can fit in memory.
  • VLSI domain
  • Spatial-Access methods (e.g., R-tree) are also
    defined in the VLSI context.
  • Divide-and-conquer algorithm (Gutting et al, IS
    93) is designed for rectangle intersection
    problem with large sizes.
  • However, still the I/O time is not well
    addressed. I/O is essential for spatial join
    algorithms due to the massive amount of spatial

What is Special about Spatial ?
  • There exists no total ordering among spatial
    objects that preserves spatial proximity
  • Space-filling curves can be used, but not with an
    accurate ordering.
  • Many spatial operators are not closed
  • The intersection of two polygons may return any
    number of single points, dangling edges, or
    disjoint polygons.
  • Spatial operates are more expensive than standard
    relational operators
  • Examples of Spatial operators are overlap,
    contained, include
  • Spatial databases tend to be large
  • The cities, rivers, restaurants, gas stations,
    forests, highways in USA.
  • Spatial data have a complex structure
  • Imagine representing the boundaries of Lafayette
    in a database.

Filter Step and Refinement Step
  • Filter Step
  • An approximation of each spatial object (e.g.,
    the minimum bounding rectangle) is used to
    eliminate tuples that cannot be part of the
    result. This step produces candidates that are a
    superset of the actual result.
  • Refinement Step
  • Each candidate is examined to check if it is a
    part of the result. I/O cost due to fetching the
    exact object from the disk and a CPU-intensive
    computational geometry algorithm.
  • An intermediate step (geometric filter) is used
    in some algorithms.

The Filter Step
  • Transformation approaches
  • There is no index in any of the relations
  • The two relations are indexed.
  • There exists only one index for only one relation
  • Unified approaches regardless of the index

Transformation Approaches for Spatial Join
  • Map to the one-dimensional space (Orenstein et
    al, TSE 88)
  • Rectangles are sorted according to the Z-order.
  • Two one-dimensional spatial join algorithms are
    proposed (spatial-merge, and spatial-filter).
  • Both algorithms are later enhanced in (Aref et
    al, SDH 94) with the linear-scan and
    estimate-based spatial join algorithms.
  • Map to higher-dimensions. (Becker et al, ICDE 93)
  • D-dimensional rectangles are transformed into
    points in the 2D-dimesional space (corner
  • A multi-dimensional join algorithm is used with
    the support of grid files.
  • Transformation-Based Spatial Join (Song et al
    CIKM99, TKDE 99)
  • Corner transformation are used
  • A special algorithm for spatial join that does
    not rely on indexing is proposed.

Spatial Join algorithms without Indexing Support
PBSM (Partition-Based Spatial-Merge Join) (Patel
et al, SIGMOD 96)
  • The spatial universe is divided into disjoint P
  • The MBRs from the two relations are mapped to
    their partitions. One MBR can be clipped to
    several partitions.
  • An in-memory spatial join algorithm is used for
    each partition using the plane-sweep algorithm.
  • The number of partitions is chosen to allow each
    partition to fit in memory.
  • Additional techniques are provided to handle data
  • If data inside one partition is still cannot fit
    in memory, a recursive partitioning may be used.
  • The output data may contain duplicates.
  • A sorting step need to be done in the Refinement
    step to remove the duplicates.

Spatial Hash-Join (Lo et al, SIGMOD 96)
  • A general framework for extending a relational
    hash join algorithms.
  • Can produce duplicate results

PBSM as an instance of the Spatial Hash-Join
Spatial Hash Join Designers Choice
Other Spatial join algorithms for non-indexed
  • Seeded tree approach (Lo et al, SSD 95)
  • Two seeded trees are built for both relations
    using spatial sampling techniques.
  • A depth-first traversal algorithm is used for
    joining the two seeded trees.
  • Size Separation Spatial Join, S3J (Koudas et al,
    SIGMOD 97)
  • For each entry in both data sets A, and B
  • Compute the Hilbert value H of the centroid.
  • Determine the level at which this entry belongs
    to (Similar to the Filter Tree) and place the
    entry in this level file
  • For each level file, sort entries by the Hilbert
  • Perform a synchronized scan over the pages of
    each level.
  • Scalable Sweeping-Based Spatial Join, SSSJ (Arge
    et al, VLDB 98)
  • Similar to PBSM, it is a partition-based and
    plane-sweep based approach.
  • The main contribution is that it utilize the
    foundations of computational geometry algorithms
    to improve the in-memory plane-sweep algorithm.
  • This requires changing the partitioning function
    to partition over only on-dimension.

Spatial Join algorithms For Both Indexed Relations
Spatial Join using Depth-Traversal R-Tree
(Brinkhoff et al, SIGMOD 93)
  • The sketch of the proposed algorithm for the case
    of equal heights and intersection operator is
  • Procedure spatialJoin (R, S R-Tree Node)
  • For all entries ER in R and all entries ES in S
    where ER.rect intersects ES.rect
  • If (R, S are leaf pages)
  • Output (ER, ES)
  • Else
  • SpatialJoin (ER.ref, ES.ref) // ER.ref,
    ES.ref is the node referenced by ER, ES
  • End
  • The main idea is to synchronously traverse the
    two R-trees in a depth-first traversal.
  • Enhancements are proposed to tune
  • The CPU time
  • The I/O time

Spatial Join using Depth-Traversal R-Tree (Cont.)
  • Tuning CPU-Time
  • Restricting the search space. Among all the nodes
    in R,S, we only check the entries that intersect
    with R?S.
  • Spatial sorting and plane-sweep. Use the
    plane-sweep algorithm for the set of candidate
    from nodes R, S.
  • I/O time tuning
  • Local plane-sweep order. Use the plane-sweep
    order for fetching pages from disk.
  • Local plane-sweep order with pinning. In addition
    to the previous approach, we pin the rectangles
    that have maximal degree. The degree of a
    rectangle is the number of its intersected
  • Local Z-order with pinning. Instead of doing
    plane-sweep order for reading the disk pages, we
    use the Z-order of the centroids of the

Other Spatial Join algorithms for both indexed
  • BFRJ Breadth-First R-tree Join (Huang et al,
    VLDB 97)
  • Synchronous traversal of two R-trees in a
    breadth-first traversal.
  • Unlike the depth-first traversal, where a local
    optimization is achieved for a node by node join,
    in BFRJ, a global optimization is achieved for
    each level.
  • The main idea is that, based on the global
    optimization, we can take decisions as which
    nodes need to be joined to each other.
  • Notice that these cannot be done using
    depth-first traversal, because the limitation of
    the current scope.
  • Both relations are indexed using PMR quadtree
    (Hoel et al, VLDB 95)
  • Performs a synchronized tree traversal at the
    leaf level.

Spatial Join algorithms for one indexed relation
  • Spatial join using seeded trees (Lo et al, SIGMOD
    94, TKDE 98)
  • A seeded tree is built for the non-indexed
  • The steps to build the seeded tree is guided by
    the existing R-Tree
  • The R-tree and the seeded tree are joined using
    the dept-traversal approach.
  • Sort and Match (Papadopoulos et al, SSD 99)
  • The STR bulk loading algorithm is applied for the
    non-indexed relation.
  • Instead of building the packed tree, it directly
    matches in-memory created leaf nodes with the
    existing R-tree index.
  • Slot Index Spatial join, SISJ (Mamoulis et al,
    TKDE 03)
  • SISJ combines the ideas of the seeded tree join
    with the spatial hash join.
  • The key idea is to define the spatial partitions
    of the spatial hash join using the existing

A Unified Approach for Indexed and Non-Indexed
Spatial Joins (Arge et al, EDBT 00)
  • An extension for SSSJ to deal with indexed
  • Similar to SSSJ, non-indexed data are sorted
    according to their MBRs, and fed into the
    plane-sweep algorithm.
  • For indexed data, an additional pre-processing
    step is required to exploit the index structure
    and directly extract the data in a sorting order
    according to the plane-sweep algorithm.
  • A main conclusion of this paper is that using an
    index-based approach for spatial join whenever
    indexes are available does not always lead to the
    best execution time.
  • A cost model is proposed to decide whether to
    follow an index-based approach or the unified

The Refinement Step
Multi-Step Processing (Brinkhoff et al, SIGMOD 94)
  • The refinement step is divided into two steps.
  • Identifying more false and true hits.
  • In this step, more accurate approximations other
    than the MBR is investigated to identify false
    and true hits.
  • Exact geometry intersection.
  • Eventually all the remaining pairs of candidates
    are examined at this stage. This the most time
    consuming step, where it requires CPU time to
    compute the exact intersection test, and I/O time
    to read the spatial object from disk.
  • It is important to notice that improvements in
    the exact geometry intersection step has the
    lowest impact, since its effect can be canceled
    by the improvements in the previous two steps.

Multi-Step Processing (Cont.)
  • Removing more false hits
  • Identifying true hits

Exact Geometry Processing in Multi-Step Processing
Other Work for the Refinement Step
  • Approximations other than MBR (Veenhof et al,
    BNCOD 95)
  • Approximations of spatial objects are constructed
    by rotating two parallel lines around the object.
  • Symbolic Intersection Detection (SSD 97,
    Geoinformatice 98)
  • Concerned with the exact geometry computation.
  • Enumerates all the possible situations that two
    clipped polygon segments can have inside an MBR.
  • Raster approximation (Zimbrao et al, VLDB 98).

Other Problems Related to the Spatial Join
  • Non-blocking spatial join (Luo et al, ICDE 02)
  • Multiway Spatial join
  • (Mamoulis et al, GIS 98), (Mamoulis et al, SIGMOD
    99), (Papadias et al, PODS 99), (Papadias et al,
    EDBT 02).
  • Selectivity Estimation
  • (Faloutsos et al, SIGMOD 00), (An et al, ICDE
    01), (Mamoulis et al, SSTD 01), (Sun et al, EDBT
  • Cost models
  • For R-tree based indexed relation (Gunther, ICDE
  • Parallel spatial Join
  • PMR-Quadtree (Hoel et al, VLDB 94). R-tree
    (Brinkhoff et al, ICDE 96). Non-indexed relation
    (Patel et al, GIS 00)
  • Cascaded Spatial Join (Aref et al, GIS 96)
  • Caching strategies (Abel et al, GeoInformatica
  • Duplicate Detection (Dittrich et al, ICDE 00)
  • High-dimension spatial Join (Koudas et al, ICDE

  • Spatial Join algorithms are performed in two
    steps Filter Step and Refinement Step.
  • In Filter Step, five approaches are used based
  • Transformation approaches
  • No index is available
  • Only one index is available
  • Two indices are available
  • A unified Approach.
  • In the Refinement step This step can be further
    divided into Geometric filter step and exact
    geometry processing step.
Write a Comment
User Comments (0)