Spatial Indexing for NN retrieval - PowerPoint PPT Presentation

About This Presentation
Title:

Spatial Indexing for NN retrieval

Description:

R-trees:Insertion. How to find the next node to insert the new object? ... main idea: Every face of any MBR contains at least one point of an actual spatial object! ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 49
Provided by: ValuedSony2
Learn more at: https://www.cs.bu.edu
Category:

less

Transcript and Presenter's Notes

Title: Spatial Indexing for NN retrieval


1
Spatial Indexing for NN retrieval
  • R-tree

2
R-trees
  • A multi-way external memory tree
  • Index nodes and data (leaf) nodes
  • All leaf nodes appear on the same level
  • Every node contains between m and M entries
  • The root node has at least 2 entries (children)
  • Extension of B-tree to multiple dimensions

3
Example
  • eg., w/ fanout 4 group nearby rectangles to
    parent MBRs each group -gt disk page

I
C
A
G
H
F
B
J
E
D
4
Example
  • F4

P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
5
Example
  • F4

P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
6
R-trees - format of nodes
  • (MBR obj_ptr) for leaf nodes

x-low x-high y-low y-high ...
obj ptr
...
7
R-trees - format of nodes
  • (MBR node_ptr) for non-leaf nodes

x-low x-high y-low y-high ...
node ptr
...
8
R-treesSearch
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
9
R-treesSearch
P1
P3
I
C
A
G
H
F
B
J
E
P4
D
P2
10
R-treesSearch
  • Main points
  • every parent node completely covers its
    children
  • a child MBR may be covered by more than one
    parent - it is stored under ONLY ONE of them.
    (ie., no need for dup. elim.)
  • a point query may follow multiple branches.
  • everything works for any(?) dimensionality

11
R-treesInsertion
Insert X
P1
P3
I
C
A
G
H
F
B
X
J
E
P4
D
P2
X
12
R-treesInsertion
Insert Y
P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
13
R-treesInsertion
  • Extend the parent MBR

P1
P3
I
C
A
G
H
F
B
J
E
P4
Y
D
P2
Y
14
R-treesInsertion
  • How to find the next node to insert the new
    object?
  • Using ChooseLeaf Find the entry that needs the
    least enlargement to include Y. Resolve ties
    using the area (smallest)
  • Other methods (later)

15
R-treesInsertion
  • If node is full then Split ex. Insert w

P1
P3
K
I
C
A
G
W
H
F
B
J
K
E
P4
D
P2
16
R-treesInsertion
  • If node is full then Split ex. Insert w

P3
I
P5
K
C
A
G
P1
W
H
F
B
J
E
P4
D
P2
Q2
Q1
17
R-treesSplit
  • Split node P1 partition the MBRs into two groups.
  • (A1 plane sweep,
  • until 50 of rectangles)
  • A2 linear split
  • A3 quadratic split
  • A4 exponential split
  • 2M-1 choices

P1
K
C
A
W
B
18
R-treesSplit
  • pick two rectangles as seeds
  • assign each rectangle R to the closest seed

seed1
19
R-treesSplit
  • pick two rectangles as seeds
  • assign each rectangle R to the closest
    seed
  • closest the smallest increase in area

seed1
20
R-treesSplit
  • How to pick Seeds
  • LinearFind the highest and lowest side in each
    dimension, normalize the separations, choose the
    pair with the greatest normalized separation
  • Quadratic For each pair E1 and E2, calculate the
    rectangle JMBR(E1, E2) and d J-E1-E2. Choose
    the pair with the largest d

21
R-treesInsertion
  • Use the ChooseLeaf to find the leaf node to
    insert an entry E
  • If leaf node is full, then Split, otherwise
    insert there
  • Propagate the split upwards, if necessary
  • Adjust parent nodes

22
R-TreesDeletion
  • Find the leaf node that contains the entry E
  • Remove E from this node
  • If underflow
  • Eliminate the node by removing the node entries
    and the parent entry
  • Reinsert the orphaned (other entries) into the
    tree using Insert

23
R-trees Variations
  • R-tree DO not allow overlapping, so split the
    objects (similar to z-values)
  • R-tree change the insertion, deletion
    algorithms (minimize not only area but also
    perimeter, forced re-insertion )
  • Hilbert R-tree use the Hilbert values to insert
    objects into the tree

24
R-tree

2
3
5
7
8
4
6
11
10
9
2
12
1
13
3
1
25
R-trees - Range search
  • pseudocode
  • check the root
  • for each branch,
  • if its MBR intersects the query rectangle
  • apply range-search (or print out, if
    this
  • is a leaf)

26
R-trees - NN search
27
R-trees - NN search
  • Q How? (find near neighbor refine...)

28
R-trees - NN search
  • A1 depth-first search then range query

P1
I
P3
C
A
G
H
F
B
J
E
P4
q
D
P2
29
R-trees - NN search
  • A1 depth-first search then range query

P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
30
R-trees - NN search
  • A1 depth-first search then range query

P1
P3
I
C
A
G
H
F
B
J
E
P4
q
D
P2
31
R-trees - NN search Branch and Bound
  • A2 Roussopoulos, sigmod95
  • At each node, priority queue, with promising
    MBRs, and their best and worst-case distance
  • main idea Every face of any MBR contains at
    least one point of an actual spatial object!

32
MBR face property
  • MBR is a d-dimensional rectangle, which is the
    minimal rectangle that fully encloses (bounds) an
    object (or a set of objects)
  • MBR f.p. Every face of the MBR contains at least
    one point of some object in the database

33
Search improvement
  • Visit an MBR (node) only when necessary
  • How to do pruning? Using MINDIST and MINMAXDIST

34
MINDIST
  • MINDIST(P, R) is the minimum distance between a
    point P and a rectangle R
  • If the point is inside R, then MINDIST0
  • If P is outside of R, MINDIST is the distance of
    P to the closest point of R (one point of the
    perimeter)

35
MINDIST computation
  • MINDIST(p,R) is the minimum distance between p
    and R with corner points l and u
  • the closest point in R is at least this distance
    away

u(u1, u2, , ud)
R
u
ri li if pi lt li ui if pi gt ui pi
otherwise
p
p
MINDIST 0
l
p
l(l1, l2, , ld)
36
MINMAXDIST
  • MINMAXDIST(P,R) for each dimension, find the
    closest face, compute the distance to the
    furthest point on this face and take the minimum
    of all these (d) distances
  • MINMAXDIST(P,R) is the smallest possible upper
    bound of distances from P to R
  • MINMAXDIST guarantees that there is at least one
    object in R with a distance to P smaller or equal
    to it.

37
MINDIST and MINMAXDIST
  • MINDIST(P, R) lt NN(P) ltMINMAXDIST(P,R)

MINMAXDIST
R1
R4
R3
MINDIST
MINDIST
MINMAXDIST
MINDIST
MINMAXDIST
R2
38
Pruning in NN search
  • Downward pruning An MBR R is discarded if there
    exists another R s.t. MINDIST(P,R)gtMINMAXDIST(P,R
    )
  • Downward pruning An object O is discarded if
    there exists an R s.t. the Actual-Dist(P,O) gt
    MINMAXDIST(P,R)
  • Upward pruning An MBR R is discarded if an
    object O is found s.t. the MINDIST(P,R) gt
    Actual-Dist(P,O)

39
Pruning 1 example
  • Downward pruning An MBR R is discarded if there
    exists another R s.t. MINDIST(P,R)gtMINMAXDIST(P,R
    )

R
R
MINDIST
MINMAXDIST
40
Pruning 2 example
  • Downward pruning An object O is discarded if
    there exists an R s.t. the Actual-Dist(P,O) gt
    MINMAXDIST(P,R)

R
Actual-Dist
O
MINMAXDIST
41
Pruning 3 example
  • Upward pruning An MBR R is discarded if an
    object O is found s.t. the MINDIST(P,R) gt
    Actual-Dist(P,O)

R
MINDIST
Actual-Dist
O
42
Ordering Distance
  • MINDIST is an optimistic distance where
    MINMAXDIST is a pessimistic one.

MINDIST
P
MINMAXDIST
43
NN-search Algorithm
  1. Initialize the nearest distance as infinite
    distance
  2. Traverse the tree depth-first starting from the
    root. At each Index node, sort all MBRs using an
    ordering metric and put them in an Active Branch
    List (ABL).
  3. Apply pruning rules 1 and 2 to ABL
  4. Visit the MBRs from the ABL following the order
    until it is empty
  5. If Leaf node, compute actual distances, compare
    with the best NN so far, update if necessary.
  6. At the return from the recursion, use pruning
    rule 3
  7. When the ABL is empty, the NN search returns.

44
K-NN search
  • Keep the sorted buffer of at most k current
    nearest neighbors
  • Pruning is done using the k-th distance

45
Another NN search Best-First
  • Global order HS99
  • Maintain distance to all entries in a common
    Priority Queue
  • Use only MINDIST
  • Repeat
  • Inspect the next MBR in the list
  • Add the children to the list and reorder
  • Until all remaining MBRs can be pruned

46
Nearest Neighbor Search (NN) with R-Trees
  • Best-first (BF) algorihm

y axis
Root
E
10
E
7
E
E
3
1
2
E
E
e
f
1
2
8
1
2
8
E
E
8
E
g
2
d
E
1
5
6
i
E
E
E
E
E
E
h
E
E
7
8
9
9
5
6
6
4
query point
2
13
17
5
9
contents
5
4
omitted
E
4
search
b
a
region
i
f
h
g
a
e
2
b
c
d
c
E
3
5
2
13
10
13
10
13
18
13
x axis
E
E
E
10
0
8
8
2
4
6
4
5
Action
Heap
Result
empty
E
E
Visit Root
E
1
2
8
1
2
3
follow
E
E
E
E
empty
E
E
5
5
8
1
9
4
5
3
2
6
2
E
follow
E
E
E
E
E
E
empty
E
17
13
2
5
5
8
9
7
4
5
3
9
2
6
8
E
follow
E
E
E
E
E
(h,
)
E
17
8
13
5
8
7
5
9
9
4
5
3
6
g
E
i
E
E
E
E
10
13
5
5
8
9
7
4
5
3
6
13
Report h and terminate
47
HS algorithm
  • Initialize PQ (priority queue)
  • InesrtQueue(PQ, Root)
  • While not IsEmpty(PQ)
  • R Dequeue(PQ)
  • If R is an object
  • Report R and exit (done!)
  • If R is a leaf page node
  • For each O in R, compute the Actual-Dists,
    InsertQueue(PQ, O)
  • If R is an index node
  • For each MBR C, compute MINDIST, insert into PQ

48
Best-First vs Branch and Bound
  • Best-First is the optimal algorithm in the
    sense that it visits all the necessary nodes and
    nothing more!
  • But needs to store a large Priority Queue in main
    memory. If PQ becomes large, we have thrashing
  • BB uses small Lists for each node. Also uses
    MINMAXDIST to prune some entries
Write a Comment
User Comments (0)
About PowerShow.com