Spatial Join Algorithms - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Spatial Join Algorithms

Description:

However, adaptation of these algorithms to deal with the spatial properties can be applied. ... Properties. 13. Spatial Hash Join Designer's Choice ... – PowerPoint PPT presentation

Number of Views:492
Avg rating:3.0/5.0
Slides: 28
Provided by: mok94
Category:

less

Transcript and Presenter's Notes

Title: Spatial Join Algorithms


1
Spatial Join Algorithms
  • Mohamed F. Mokbel

2
Why we need a Spatial Join ?
  • Spatial data are commonly found in applications
    like cartography, CAD, and GIS
  • In spatial join, two spatial relations are
    combined together based on some spatial criteria.
  • Examples of spatial join
  • Only spatial join is required
  • Find all forests which are in a city
  • Find all cities that are crossed by river
  • Find all cities that are affected by the fire
    region.
  • Find all buildings that overlap with a park.
  • Spatial join with a selection criteria
  • Find all government-owned buildings that overlap
    with a park.
  • Find all forests in USA that receive more than 20
    inches of average rainfall by year.
  • Find all day cares in lafayette that are within 3
    miles from houses with rent less than 600.

3
Why not using Relational Join Algorithms ?
  • Nested Loop Join
  • Every object of one relation has to be checked
    against all objects of the other relation. Since
    we consider a very large relations of spatial
    objects, the performance of the nested loop is
    not acceptable
  • Hash-Based Join
  • Hash-based joins are suitable for equi-joins but
    not for spatial joins.
  • Sort-Merge Join
  • There is no total ordering of spatial objects.
  • However, adaptation of these algorithms to deal
    with the spatial properties can be applied.

4
Is the Spatial Join Problem is Already Solved in
Another Domain?
  • Geometric domain
  • The abstraction of the spatial join is finding
    the intersection between two sets of geometric
    shapes.
  • Many solutions are provided in the context of
    geometry.
  • Geometric solutions considers only CPU cost.
  • Geometric solutions are accepted if the data set
    can fit in memory.
  • VLSI domain
  • Spatial-Access methods (e.g., R-tree) are also
    defined in the VLSI context.
  • Divide-and-conquer algorithm (Gutting et al, IS
    93) is designed for rectangle intersection
    problem with large sizes.
  • However, still the I/O time is not well
    addressed. I/O is essential for spatial join
    algorithms due to the massive amount of spatial
    data.

5
What is Special about Spatial ?
  • There exists no total ordering among spatial
    objects that preserves spatial proximity
  • Space-filling curves can be used, but not with an
    accurate ordering.
  • Many spatial operators are not closed
  • The intersection of two polygons may return any
    number of single points, dangling edges, or
    disjoint polygons.
  • Spatial operates are more expensive than standard
    relational operators
  • Examples of Spatial operators are overlap,
    contained, include
  • Spatial databases tend to be large
  • The cities, rivers, restaurants, gas stations,
    forests, highways in USA.
  • Spatial data have a complex structure
  • Imagine representing the boundaries of Lafayette
    in a database.

6
Filter Step and Refinement Step
  • Filter Step
  • An approximation of each spatial object (e.g.,
    the minimum bounding rectangle) is used to
    eliminate tuples that cannot be part of the
    result. This step produces candidates that are a
    superset of the actual result.
  • Refinement Step
  • Each candidate is examined to check if it is a
    part of the result. I/O cost due to fetching the
    exact object from the disk and a CPU-intensive
    computational geometry algorithm.
  • An intermediate step (geometric filter) is used
    in some algorithms.

7
The Filter Step
  • Transformation approaches
  • There is no index in any of the relations
  • The two relations are indexed.
  • There exists only one index for only one relation
  • Unified approaches regardless of the index
    existence

8
Transformation Approaches for Spatial Join
  • Map to the one-dimensional space (Orenstein et
    al, TSE 88)
  • Rectangles are sorted according to the Z-order.
  • Two one-dimensional spatial join algorithms are
    proposed (spatial-merge, and spatial-filter).
  • Both algorithms are later enhanced in (Aref et
    al, SDH 94) with the linear-scan and
    estimate-based spatial join algorithms.
  • Map to higher-dimensions. (Becker et al, ICDE 93)
  • D-dimensional rectangles are transformed into
    points in the 2D-dimesional space (corner
    transformation).
  • A multi-dimensional join algorithm is used with
    the support of grid files.
  • Transformation-Based Spatial Join (Song et al
    CIKM99, TKDE 99)
  • Corner transformation are used
  • A special algorithm for spatial join that does
    not rely on indexing is proposed.

9
Spatial Join algorithms without Indexing Support
10
PBSM (Partition-Based Spatial-Merge Join) (Patel
et al, SIGMOD 96)
  • The spatial universe is divided into disjoint P
    partitions.
  • The MBRs from the two relations are mapped to
    their partitions. One MBR can be clipped to
    several partitions.
  • An in-memory spatial join algorithm is used for
    each partition using the plane-sweep algorithm.
  • The number of partitions is chosen to allow each
    partition to fit in memory.
  • Additional techniques are provided to handle data
    skew.
  • If data inside one partition is still cannot fit
    in memory, a recursive partitioning may be used.
  • The output data may contain duplicates.
  • A sorting step need to be done in the Refinement
    step to remove the duplicates.

11
Spatial Hash-Join (Lo et al, SIGMOD 96)
  • A general framework for extending a relational
    hash join algorithms.
  • Can produce duplicate results

12
PBSM as an instance of the Spatial Hash-Join
framework
13
Spatial Hash Join Designers Choice
14
Other Spatial join algorithms for non-indexed
relations
  • Seeded tree approach (Lo et al, SSD 95)
  • Two seeded trees are built for both relations
    using spatial sampling techniques.
  • A depth-first traversal algorithm is used for
    joining the two seeded trees.
  • Size Separation Spatial Join, S3J (Koudas et al,
    SIGMOD 97)
  • For each entry in both data sets A, and B
  • Compute the Hilbert value H of the centroid.
  • Determine the level at which this entry belongs
    to (Similar to the Filter Tree) and place the
    entry in this level file
  • For each level file, sort entries by the Hilbert
    value
  • Perform a synchronized scan over the pages of
    each level.
  • Scalable Sweeping-Based Spatial Join, SSSJ (Arge
    et al, VLDB 98)
  • Similar to PBSM, it is a partition-based and
    plane-sweep based approach.
  • The main contribution is that it utilize the
    foundations of computational geometry algorithms
    to improve the in-memory plane-sweep algorithm.
  • This requires changing the partitioning function
    to partition over only on-dimension.

15
Spatial Join algorithms For Both Indexed Relations
16
Spatial Join using Depth-Traversal R-Tree
(Brinkhoff et al, SIGMOD 93)
  • The sketch of the proposed algorithm for the case
    of equal heights and intersection operator is
  • Procedure spatialJoin (R, S R-Tree Node)
  • For all entries ER in R and all entries ES in S
    where ER.rect intersects ES.rect
  • If (R, S are leaf pages)
  • Output (ER, ES)
  • Else
  • SpatialJoin (ER.ref, ES.ref) // ER.ref,
    ES.ref is the node referenced by ER, ES
  • End
  • The main idea is to synchronously traverse the
    two R-trees in a depth-first traversal.
  • Enhancements are proposed to tune
  • The CPU time
  • The I/O time

17
Spatial Join using Depth-Traversal R-Tree (Cont.)
  • Tuning CPU-Time
  • Restricting the search space. Among all the nodes
    in R,S, we only check the entries that intersect
    with R?S.
  • Spatial sorting and plane-sweep. Use the
    plane-sweep algorithm for the set of candidate
    from nodes R, S.
  • I/O time tuning
  • Local plane-sweep order. Use the plane-sweep
    order for fetching pages from disk.
  • Local plane-sweep order with pinning. In addition
    to the previous approach, we pin the rectangles
    that have maximal degree. The degree of a
    rectangle is the number of its intersected
    rectangles.
  • Local Z-order with pinning. Instead of doing
    plane-sweep order for reading the disk pages, we
    use the Z-order of the centroids of the
    rectangles.

18
Other Spatial Join algorithms for both indexed
relations
  • BFRJ Breadth-First R-tree Join (Huang et al,
    VLDB 97)
  • Synchronous traversal of two R-trees in a
    breadth-first traversal.
  • Unlike the depth-first traversal, where a local
    optimization is achieved for a node by node join,
    in BFRJ, a global optimization is achieved for
    each level.
  • The main idea is that, based on the global
    optimization, we can take decisions as which
    nodes need to be joined to each other.
  • Notice that these cannot be done using
    depth-first traversal, because the limitation of
    the current scope.
  • Both relations are indexed using PMR quadtree
    (Hoel et al, VLDB 95)
  • Performs a synchronized tree traversal at the
    leaf level.

19
Spatial Join algorithms for one indexed relation
  • Spatial join using seeded trees (Lo et al, SIGMOD
    94, TKDE 98)
  • A seeded tree is built for the non-indexed
    relation.
  • The steps to build the seeded tree is guided by
    the existing R-Tree
  • The R-tree and the seeded tree are joined using
    the dept-traversal approach.
  • Sort and Match (Papadopoulos et al, SSD 99)
  • The STR bulk loading algorithm is applied for the
    non-indexed relation.
  • Instead of building the packed tree, it directly
    matches in-memory created leaf nodes with the
    existing R-tree index.
  • Slot Index Spatial join, SISJ (Mamoulis et al,
    TKDE 03)
  • SISJ combines the ideas of the seeded tree join
    with the spatial hash join.
  • The key idea is to define the spatial partitions
    of the spatial hash join using the existing
    R-tree.

20
A Unified Approach for Indexed and Non-Indexed
Spatial Joins (Arge et al, EDBT 00)
  • An extension for SSSJ to deal with indexed
    relations.
  • Similar to SSSJ, non-indexed data are sorted
    according to their MBRs, and fed into the
    plane-sweep algorithm.
  • For indexed data, an additional pre-processing
    step is required to exploit the index structure
    and directly extract the data in a sorting order
    according to the plane-sweep algorithm.
  • A main conclusion of this paper is that using an
    index-based approach for spatial join whenever
    indexes are available does not always lead to the
    best execution time.
  • A cost model is proposed to decide whether to
    follow an index-based approach or the unified
    approach.

21
The Refinement Step
22
Multi-Step Processing (Brinkhoff et al, SIGMOD 94)
  • The refinement step is divided into two steps.
  • Identifying more false and true hits.
  • In this step, more accurate approximations other
    than the MBR is investigated to identify false
    and true hits.
  • Exact geometry intersection.
  • Eventually all the remaining pairs of candidates
    are examined at this stage. This the most time
    consuming step, where it requires CPU time to
    compute the exact intersection test, and I/O time
    to read the spatial object from disk.
  • It is important to notice that improvements in
    the exact geometry intersection step has the
    lowest impact, since its effect can be canceled
    by the improvements in the previous two steps.

23
Multi-Step Processing (Cont.)
  • Removing more false hits
  • Identifying true hits

24
Exact Geometry Processing in Multi-Step Processing
25
Other Work for the Refinement Step
  • Approximations other than MBR (Veenhof et al,
    BNCOD 95)
  • Approximations of spatial objects are constructed
    by rotating two parallel lines around the object.
  • Symbolic Intersection Detection (SSD 97,
    Geoinformatice 98)
  • Concerned with the exact geometry computation.
  • Enumerates all the possible situations that two
    clipped polygon segments can have inside an MBR.
  • Raster approximation (Zimbrao et al, VLDB 98).

26
Other Problems Related to the Spatial Join
  • Non-blocking spatial join (Luo et al, ICDE 02)
  • Multiway Spatial join
  • (Mamoulis et al, GIS 98), (Mamoulis et al, SIGMOD
    99), (Papadias et al, PODS 99), (Papadias et al,
    EDBT 02).
  • Selectivity Estimation
  • (Faloutsos et al, SIGMOD 00), (An et al, ICDE
    01), (Mamoulis et al, SSTD 01), (Sun et al, EDBT
    02)
  • Cost models
  • For R-tree based indexed relation (Gunther, ICDE
    93)
  • Parallel spatial Join
  • PMR-Quadtree (Hoel et al, VLDB 94). R-tree
    (Brinkhoff et al, ICDE 96). Non-indexed relation
    (Patel et al, GIS 00)
  • Cascaded Spatial Join (Aref et al, GIS 96)
  • Caching strategies (Abel et al, GeoInformatica
    99)
  • Duplicate Detection (Dittrich et al, ICDE 00)
  • High-dimension spatial Join (Koudas et al, ICDE
    98)

27
Summary
  • Spatial Join algorithms are performed in two
    steps Filter Step and Refinement Step.
  • In Filter Step, five approaches are used based
    on
  • Transformation approaches
  • No index is available
  • Only one index is available
  • Two indices are available
  • A unified Approach.
  • In the Refinement step This step can be further
    divided into Geometric filter step and exact
    geometry processing step.
Write a Comment
User Comments (0)
About PowerShow.com