Title: Spatial Indexing
1Spatial Indexing
2Spatial Access Methods
- PAMs
- Grid File
- kd-tree based (LSD-, hB- trees)
- Z-ordering B-tree
- R-tree
- Variations R-tree, Hilbert R-tree
3R-tree
Multi-way external memory structure, indexes
MBRs Dynamic structure
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
4R-tree properties
- Main points
- every parent node completely covers its
children - a child MBR may be covered by more than one
parent - it is stored under ONLY ONE of them.
(ie., no need for dup. elim.) - a point query may follow multiple branches.
5R-tree
- The original R-tree tries to minimize the area of
each enclosing rectangle in the index nodes. - Is there any other property that can be
optimized?
R-tree ? Yes!
6R-tree
- Optimization Criteria
- (O1) Area covered by an index MBR
- (O2) Overlap between directory MBRs
- (O3) Margin of a directory rectangle
- (O4)Storage utilization
- Sometimes it is impossible to optimize all the
above criteria at the same time!
7R-tree
- ChooseSubtree
- If next node is a leaf node, choose the node
using the following criteria - Least overlap enlargement
- Least area enlargement
- Smaller area
- Else
- Least area enlargement
- Smaller area
8R-tree
- SplitNode
- Choose the axis to split
- Choose the two groups along the chosen axis
- ChooseSplitAxis
- Along each axis, sort rectangles and break them
into two groups (M-2m2 possible ways where one
group contains at least m rectangles). Compute
the sum S of all margin-values of each pair of
groups. Choose the one that minimizes S - ChooseSplitIndex
- Along the chosen axis, choose the grouping that
gives the minimum overlap-value
9R-tree
- Forced Reinsert
- defer splits, by forced-reinsert, i.e. instead
of splitting, temporarily delete some entries,
shrink overflowing MBR, and re-insert those
entries - Which ones to re-insert?
- How many? A 30
10R-tree variations
- What about static datasets?
- (no ins/del) Hilbert
- What about other bounding shapes?
11R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
12R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
- A1 plane-sweep
- great for queries on x
- terrible for y
13R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
- A1 plane-sweep
- great for queries on x
- bad for y
14R-trees - variations
- what about static datasets (no ins/del/upd)?
- Q Best way to pack points?
- A1 plane-sweep
- great for queries on x
- terrible for y
- Q how to improve?
15R-trees - variations
- A plane-sweep on HILBERT curve!
16R-trees - variations
- A plane-sweep on HILBERT curve!
- In fact, it can be made dynamic (how?), as well
as to handle regions (how?)
17R-trees - variations
- Dynamic (Hilbert R-tree)
- each point has an h-value (hilbert value)
- insertions like a B-tree on the h-value
- but also store MBR, for searches
18Hilbert R-tree
- Data structure of a node?
x-low, ylow x-high, y-high
LHV
ptr
h-value gt LHV MBRs inside parent MBR
19R-trees - variations
- Data structure of a node?
B-tree
x-low, ylow x-high, y-high
LHV
ptr
h-value gt LHV MBRs inside parent MBR
20R-trees - variations
- Data structure of a node?
R-tree
x-low, ylow x-high, y-high
LHV
ptr
h-value gt LHV MBRs inside parent MBR
21R-trees - variations
- What if we have regions, instead of points?
- I.e., how to impose a linear ordering (h-value)
on rectangles?
22R-trees - variations
- What if we have regions, instead of points?
- I.e., how to impose a linear ordering (h-value)
on rectangles? - A1 h-value of center
- A2 h-value of 4-d point
- (center, x-radius,
- y-radius)
- A3 ...
23R-trees - variations
- What if we have regions, instead of points?
- I.e., how to impose a linear ordering (h-value)
on rectangles? - A1 h-value of center
- A2 h-value of 4-d point
- (center, x-radius,
- y-radius)
- A3 ...
24R-trees - variations
- with h-values, we can have deferred splits,
2-to-3 splits (3-to-4, etc) - Instead of splitting a full node, find the
siblings (using the h-values) and redistribute
the rectangles among the nodes. Split only when
all siblings are full. - experimentally faster than R-trees
- (reference Kamel Faloutsos vldb 94)
25R-trees - variations
- what about other bounding shapes? (and why?)
- A1 arbitrary-orientation lines (cell-tree,
Guenther - A2 P-trees (polygon trees) (MB polygon 0, 90,
45, 135 degree lines)
26R-trees - variations
- A3 L-shapes holes (hB-tree)
- A4 TV-trees Lin, VLDB-Journal 1994
- A5 SR-trees Katayama, SIGMOD97 (used in
Informedia)
27R-trees - conclusions
- Popular method like multi-d B-trees
- guaranteed utilization
- good search times (for low-dim. at least)
- Informix ships DataBlade with R-trees
28Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
29Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
30Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
31Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
32Spatial Queries
- Given a collection of geometric objects (points,
lines, polygons, ...) - organize them on disk, to answer
- point queries
- range queries
- k-nn queries
- spatial joins (all pairs queries)
33R-trees - Range search
- pseudocode
- check the root
- for each branch,
- if its MBR intersects the query rectangle
- apply range-search (or print out, if
this - is a leaf)
34R-trees - NN search
35R-trees - NN search
- Q How? (find near neighbor refine...)
36R-trees - NN search
- A1 depth-first search then, range query
P1
I
P3
C
A
G
H
F
B
J
E
P4
q
D
P2
37R-trees - NN search
- A1 depth-first search then, range query
P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
38R-trees - NN search
- A1 depth-first search then, range query
P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
39R-trees - NN search
- A2 Roussopoulos, sigmod95
- priority queue, with promising MBRs, and their
best and worst-case distance - main idea Every face of any MBR contains at
least one point of an actual spatial object!
40R-trees - NN search
consider only P2 and P4, for illustration
q
41R-trees - NN search
best of P4
gt P4 is useless for 1-nn
worst of P2
H
J
E
P4
q
D
P2
42R-trees - NN search
- what is really the worst of, say, P2?
worst of P2
E
q
D
P2
43R-trees - NN search
- what is really the worst of, say, P2?
- A the smallest of the two red segments!
q
P2
44R-trees - NN search
- variations Hjaltason Samet incremental nn
- build a priority queue
- scan enough of the tree, to make sure you have
the k nn - to find the (k1)-th, check the queue, and scan
some more of the tree - optimal (but, may need too much memory)