MultiDimensional Range Searching - PowerPoint PPT Presentation

1 / 52

About This Presentation

Title:

MultiDimensional Range Searching

Description:

Simple & independent of query shape. Good only for uniform distribution of input and can be quite ... Classical solutions, though good for small dimension space, ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 53

Provided by: csU45

Category:

more less

Transcript and Presenter's Notes

Title: MultiDimensional Range Searching

1
Multi-Dimensional Range Searching
Committee Subhash Suri (Chair), Amr El Abbadi,
Teofilo Gonzalez

Amit Bhosle
Department of Computer Science
University of California Santa Barbara

2
What I Will Cover Today

Introduction problem statement / applications
Some classical solutions
A lower bound by Chazelle in orthogonal range
searching
Indexing schemes in context to range searching
Some indexing structures (R-trees and Box trees)

3
What is Range Searching ?

Preprocess a set P of objects for efficiently
answering queries.
Typically, P is a collection of geometric objects
(points, rectangles, polygons) in Rd.
Query Range, Q d-rectangles, balls, halfspaces,
simplices, etc..
Either count all objects in P ? Q or report
the objects themselves.

4
Example Points in R2
Q1
Q2
5
Why Study Range Searching ?

Applications in several fields
Databases
Spatial databases (G.I.S.)
Computer Graphics
Robotics
Algorithmic tool (example ?? )
And more..

6
Some Classical Approaches

Griding or bucketing
Simple independent of query shape
Good only for uniform distribution of input and
can be quite bad for skewed data.
Range Trees
Good query time and space for lower dimensions
Poor in higher dimensions O(logdn)
kD-Trees
Linear space in all dimensions.
Query time becomes almost linear for high
dimensions O(n1-1/d).

7
The Grid Approach
k
2
1

k
1
2
8
Grids (Contd.)

Either the queries should be aligned with the
grid or result is approximate.
Cell sizes need not be uniform and can be adapted
to data distribution.
O(kd-1) query time, O(nd k d) preprocessing
time and O(nk d) space ( k is the number of
divisions of each axis).
Error decreases as k increases, but space
requirement increases.

9
Range Trees

1-D case Build a balanced binary tree using the
points co-ordinates as the keys.

6
17
2
4
5
7
8
12
15
19
7
Counting ?
4
12
Optimal time/space
2
5
8
15
7
19
15
12
8
2
4
5
10
Range Trees (Contd.)

d-Dimensions
Build a 1-D range tree on the first dimension.
Each internal node points to another tree built
recursively on the remaining d-1 dimension for
the points in its subtree.

P3
P3
P2
P2
P4
P4
P1
P1
P4
P3
P2
P1
11
Range Trees (Contd.)
P3
P8
P5
P2
P6
P4
P7
P1
P3
P2
P4
P1
P4
P3
P2
P1
P8
P7
P6
P5
12
Range Trees (Contd.)

Search by the first dimension gives us a O(logn)
subtrees which together contain the output
point(s).
Search the remaining d-1 dimensions recursively
among these.
O(logd n k) query time, O(n logd-1 n)
preprocessing time and space.

13
kD-Trees (k-dimensional Trees)

1-d tree split along median point and
recursively build subtrees for the left and right
sets.
Higher dimensions same approach, but cycle
through the dimensions. Or, select the next
dimension as the one with the widest spread.
Efficiency of query processing drops as
dimensions increase (becomes almost linear).
However, the space requirement remains linear
O(n.d)

14
kD-Trees (Contd.)
c
o
m
d
f
l
n
a
b
e
g
j
k
h
i
f
l
j
i
h
k
d
n
m
e
g
b
a
c
o
15
kD-Trees (Contd.)

Query complexity How many cells can a query box
intersect ?

Let us consider a facet of the query

Any axis parallel line can intersect atmost 2 of
these 4 cells.
Each of these 4 cells contain exactly n/4 points.

Q(n) 2.Q(n/4) 1
Q(n) O(n1/2)
i.e. Query answered in O(n1-1/d m) time where m
is the output size

16
Summary of Classical Solutions

Classical solutions, though good for small
dimension space, do not perform well in higher
dimensions.
Updates (inserts/deletes) are expensive (we did
not discuss them).
Desired properties of the data structure
Near linear size
Query time O(k f(n)) where f is a very slowly
increasing function.
Preprocessing time not as important as the above
two.

17
Lower Bounds in Orthogonal Range Searching

Bernard Chazelle, Princeton
Proved that for the range reporting problem,
O(kpolylog(n)) query time requires
?(n(logn/loglogn)d-1) space on a pointer
machine.
Lower bound holds only for pointer algorithms
These algorithms need an explicit pointer to
an object to access it! e.g. They cannot use the
co-ordinates of the points for indexing into a
structure, etc..
Algorithms based on Range Trees, kD-trees fall
in this class of algorithms.

18
Models of Computation
Memory access rules differ for pointer and RAM
machines
Memory
Output Device
Input Device
Central Processing Unit
Control Unit
19
Chazelles Lower Bound
? (root)
Data structure A digraph G(V,E) of bounded
out-degree.
G has a representative node for each input point
and some other internal nodes.
For a query q, the algorithm non-deterministicall
y traverses G, adds/deletes some (internal) nodes
and edges, and produces a set W(q) of nodes which
is a superset of answer points nodes.
20
Chazelles Lower Bound (Contd.)
Range Trees and kD-Trees are some such data
structures.
Range Trees
kD-Trees
P3
P2
P4
f
P1
a
b
c
d
g
e
P4
P3
P2
P1
21
Chazelles Lower Bound (Contd.)
Desired query time O(k polylog(n)) (k is the
output size)
? (root)
For the query time to be linear in k, the nodes
for the answer set should be close to each
other. This should hold for any query q.
22
Chazelles Lower Bound (Contd.)

If we have a set of queries, Sq1 ,q2 ,,qs,
such that
P ? qi ? logb n (each range has many
points)
P ? qi ? qj ? 1 (no two ranges share many
points)
Then, each qi has a representative subset of
nodes in G which is compact (and has many edges).
gt G has many edges.
By the bounded out-degree condition on G, it
consequently has many vertices, and thus requires
large amount of memory.

23
Chazelles Lower Bound (Contd.)

If S exhibits the desired properties, then he
shows that for a query time of
W(q) a.(k logbn) ,
V gt S (logbn) / 216a4 (long
algebraic proof)
Recall that W(q) is the output set produced for a
query q.
The point set and the queries are generated as
follows
Let n m? ,
where m ?2logbp? and
? ? log p / (1 b.loglogp) ? for
some large integer p.
If p is large enough, m ? logbn and ? ? log
n /(1 b.loglogn)

24
Bad Input Set and Queries

Define the point set as P (?m(i), i) 0 ?
i ? n
?m(i) Write i in base m over ? bits and
reverse the bits.
Consider a tree T, which encodes the x -
co-ordinates of points in P (take their m-ary
representation).
Each node has m children labeled 0,1,2,,m-1.

.
1st bit
.
2nd bit
0 1 2 m-1
.
ith bit
Height of T is ? (one level for each bit)
.
25
Generating S

0 1 2
m-1
A node at depth r is associated with m ?-r
points those points whose m-ary representation
has the first r bits as the ancestors of this
node.
Sort them by the y co-ordinate and split them
into groups of m points.
Total no. of groups ?( nodes at level r).(m
?-r/m)
Each group can be enclosed in a query box
Total of ?m ?-1 queries.

26
Eg n 33
27 queries
For n m?, we have ? m?-1 queries
27
Indexing Perspective
Data
Disk
Data is too large to be stored in memory and has
to be stored on the disk (in chunks of size B
possibly with repetition. B, the block size is
the unit of data transfer from disk in one read).
Storage Redundancy Maximum number of copies of
a data item. A query regarding items satisfying
some criteria is answered by retrieving blocks
from the disk such that the contained points form
a superset of the answer. Access Overhead Ratio
of no. of blocks retrieved to the minimum no. of
blocks required to answer the query.
28
Indexing Perspective (Contd.)
Blocks (redundancy 1)
Now redundancy 2
Data

Better overhead if queries have same aspect
ratio as our blocks.
Else, far more blocks have to be retrieved than
bare minimum!
Idea Have blocks of several different aspect
ratios.

29
Indexing - Limitations

A query can have any aspect ratio.
Not possible to have blocks of every aspect ratio
with limited memory.
Have blocks with sufficient no. of different
aspect ratios so that any aspect ratio can be
approximated (Hellerstein, Koutsoupias,
Papadimitriou).

30
Overhead for Redundancy r

Choose blocks so that any aspect ratio can be
approximated.
Blocks will have the shape Bx ? B1-x
Let x (2i-1) /2r i 1,2,,r
Store all such blocks
Redundancy is r since there are r shapes for the
blocks and an input point
can be present in only 1 block of a particular
shape

31
Overhead for fixed redundancy..

If query is aligned with the blocks, then let k
blocks suffice.
We can easily form a query which will require 2.k
2 blocks to be covered.

q
They achieve k B1/2r
k
q
32
Lower Bound on Access Overhead for r1

Access overhead, a ?(B1-1/d)
d2 Use only B ? 1 and 1 ? B queries (2n2/B
total queries).
Let s ? S intersect x horizontal and y vertical
lines.

x.y ? B
x y ? 2 B1/2
i.e. s intersects atleast 2 B1/2 of the above
queries.
Block-query product (n2/B) 2 B1/2
gt Average no. of blocks a query intersects
B1/2

n
x
y
33
Indexing Structures

R-trees as indexing structures Extension of
B-Trees to multiple dimensions.

Node degrees Internal node between t and
2t Root between 2 and 2t
Input objects associated with leaves and all
leaves at the same level. Each internal node
stores the smallest bounding box of the objects
in its subtree.
34
R-Trees
B
u
q
r
t
p
H
C
B
A
v
s
A
w
A
C
y
s
r
q
p
z
y
x
x
H
z
C
B
w
v
u
t
Performance is measured as the number of disk
accesses required to answer a query.
35
Lower Bounds on R-trees

Query processing in bounding box hierarchies.
Almost similar to query processing in kD-trees.
Crossing number as a measure of efficiency of a
bounding box hierarchy the smaller, the better
!
There is a collection of n d-rectangles, for
which any r-tree T of min-degree t there is a
query box intersecting ?((n/t)1-1/d) nodes of T
and none of the input d-rectangles. (Pankaj
Agarwal, et al.)

36
R-tree Efficiency

Bounding box of any t squares hits ? 2(t1/2-1)
queries.
Total bounding-box query intersections ?
(n/t1/2)
Total queries 2(n1/2-1) O(n1/2) gt A query
intersects atleast ?((n/t)1/2) bounding boxes.
In general, an empty query box intersects ?
((n/t)1-1/d) bounding boxes of the rtree.

Query boxes
Input rectangle
37
Good Box-trees and Conversion to Good R-trees

Pankaj Agarwal, et al.
kD-trees for rectangle intersection queries

c,d
x2 , y2
a,b
x1 , y1
The rectangles intersect iff (c,d) ? (x1 , y1)
(a,b) ? (x2 , y2) i.e. (-a,-b,c,d ) ? (-x2 ,
-y2 , x 1 , y1)
38
kD-Trees to Box Trees

Trivial to verify that the original problem of
range searching on rectangles is now a problem of
range searching on points.
Build a kD-tree on these points O(n1-1/2d k)
query time.
Convert to a box-tree as follows
replace each points in leaves of the kD-tree
with the corresponding d-rectangle
at each internal node, store the bounding box of
its children.
Careful analysis shows that the query time is
actually O(n1-1/d klogn)

39
Box Tree Analysis

What is a visited node ? A node is said to be
visited if the query algorithm continues to its
children nodes.
Two types
the input boxes in the subtree of a visited node
v have one or more output boxes (atmost k such
nodes).
all boxes stored in subtree of v are disjoint
from the query Q
(not many of such nodes can be visited).

40
Box Tree Analysis
a
All input boxes cannot be separated from Q by the
same hyperplane. Thus, atleast 2 such
hyperplanes which separate an input box in
subtree of v from Q.
b
Q
In 2d space, points representing a and b lie on
opposite side of the above hyperplane through a
facet of Q. Thus, this hyperplane intersects the
cell representing v. The other hyperplane also
intersects the cell of v. Thus, their
intersection, which is a 2d-2 flat also
intersects the cell of v.
41
Box Tree Analysis (Contd.)

By the property of kD-trees, such cells can be
atmost O(2i.(2d-2)/2d) O(2i(1-1/d)).
Height of the tree is O(log n) (kD-trees are
perfectly balanced).
Thus total number of visited nodes for a query
?(k 2i(1-1/d) ) O(klogn n 1-1/d )
Using a slightly modified construction of the box
tree, they reduce the query time to O(k n 1-1/d
).

42
Avenues for Further Research

Lower bounds suggest that no data structure might
be possible which scales well in high-dimension
space for an entirely generic set of inputs and
queries.
Interesting assumptions about the input objects
and queries might result in better performance.
Pankaj et al. showed that R-Trees do not have a
good worst case performance even if input is a
set of hypercubes.

43
Further Research

What if queries are also hypercubes or have O(1)
aspect ratio ?
The lower bounds do not hold in these cases
both for R-Trees and indexing.
Mark deBerg, et al. constructed box trees with
polylog query time for collision checking in
industrial installations.

Thank You

45
Junk
46
2-d Case
Query boxes
Input rectangle
47
kD-Trees (Contd.)
d
c
f
a
b
g
f
e
a
b
c
d
g
e
O(n) size data structure. O(nlogn) construction
time.
48
Indexing Perspective

Hellerstein, Koutsoupias, Papadimitriou
Efficiency of an indexing scheme for a database
Storage redundancy how many copies of a data
item
Access overhead how many times more blocks
than necessary does a query retrieve.
An indexing problem is defined in the context of
a workload.
Workload consists of
A domain (e.g. Rd ),
A subset of the domain called instance (e.g. a
set of points in Rd ), and
A set of subsets of the instance, the set of
queries (Eg. d-rectangles).

49
Range Searching as Indexing Workloads

Range queries in R2
Domain, D R2
Instance, I (i,j) 1? i,j ? n
Query, Qa,b,c,d (i,j) a ? i ? b, c ? j
? d
one query for each quadruple (a,b,c,d)
with
1 ? a ? b ? n and 1 ? c ? d ? n
Indexing Schemes
A collection S s1 ,s2,ss of blocks,
si I
A query retrieves a set of blocks which cover it
(possibly retrieving more blocks than necessary).

50
Access Overhead for fixed Storage Redundancy
If we have blocks with the same aspect ratio as
the query, then best overhead
But, query can have any aspect ratio. Not
possible to have blocks in S of all possible
aspect ratios (storage redundancy is fixed at r ).
51
Overhead when r 1 (Contd.)

d 3
Consider B? 1? 1 , 1? B? 1 and 1? 1? B queries
Let s ? S intersect x, y and z lines in each
direction.

x.y.z ? B gt No. of queries intersected
xy yz zx ? 3.B2/3 No. of blocks
n3/B Block-query intersecting pairs
3B2/3.n3/B No. of queries 3.n3/B Thus, a query
intersects B2/3 blocks. In d-dimensions,
overhead is ?(B1-1/d)
z
x
y
52
(No Transcript)

Write a Comment

User Comments (0)