Title: 2IL05 Data Structures
12IL05 Data Structures
- Spring 2009Lecture 10 Range Searching
2Augmenting data structures
- Methodology for augmenting a data structure
- Choose an underlying data structure.
- Determine additional information to maintain.
- Verify that we can maintain additional
information for existing data structure
operations. - Develop new operations.
- You dont need to do these steps in strict order!
- Red-black trees are very well suited to
augmentation
3Augmenting red-black trees
- TheoremAugment a R-B tree with field f, where
fx depends only on information in x, leftx,
and rightx (including fleftx and
frightx). Then can maintain values of f in
all nodes during insert and delete without
affecting O(log n) performance. - When we alter information in x, changes
propagate only upward on the search path for x - Examples
- OS-tree new operations OS-Select and OS-Rank
- Interval-tree new operation Interval-Search
4Range Searching
5Application Database queries
- Example Database for personnel administration
(name, address, date of birth, salary, ) - Query Report all employees born between 1950
and 1955 who earn between 3000 and 4000 per
month.
More parameters? Report all employees born
between 1950 and 1955 who earn between 3000 and
4000 per month and have between two and four
children. ? more dimensions
6Application Database queries
- Report all employees born between 1950 and 1955
who earn between 3000 and 4000 per month and
have between two and four children.
Rectangular range query or orthogonal range query
71-Dimensional range searching
- Pp1, p2, , pn set of points on the real line
- Query given a query interval x x report all
pi ? P with pi ? x x. - Solution Use a balanced binary search tree T.
- leaves of T store the points pi
- internal nodes store splitting values (node v
stores value xv)
81-Dimensional range searching
- Query x x ? search with x and x ? end
in two leaves µ and µ - Report 1. all leaves between µ and µ
- 2. possibly points stored at µ and µ
18 77
49
80
23
62
10
37
89
70
19
93
59
30
3
49
89
80
19
3
10
23
30
37
59
70
62
93
98
µ
µ
91-Dimensional range searching
- Query x x ? search with x and x ? end
in two leaves µ and µ - Report 1. all leaves between µ and µ
- 2. possibly points stored at µ and µ
18 77
49
23
80
How do we find all leaves between µ and µ?
10
62
37
89
19
70
93
59
30
3
49
89
70
62
59
37
30
23
19
80
3
10
93
98
µ
µ
101-Dimensional range searching
- How do we find all leaves between µ and µ ?
- Solution They are the leaves of the subtrees
rooted at nodes v in between the two search
paths whose parents are on the search paths. - ? we need to find the node vsplit where the
search paths split
111-Dimensional range searching
- FindSplitNode(T, x, x)
- ? Input. A tree T and two values x and x with x
x - ? Output. The node v where the paths to x and x
split, or the leaf where both paths end. - v ? root(T)
- while v is not a leaf and (x xv or x gt xv)
- do if x xv
- then v ? left(v)
- else v ? right(v)
- return v
121-Dimensional range searching
- Starting from vsplit follow the search path to x.
- when the paths goes left, report all leaves in
the right subtree - check if µ ? x x
- Starting from vsplit follow the search path to
x. - when the paths goes right, report all leaves in
the left subtree - check if µ ? x x
131-Dimensional range searching
- 1DRangeQuery(T, x x)
- ? Input. A binary search tree T and a range x
x. - ? Output. All points stored in T that lie in the
range. - vsplit ? FindSplitNode(T, x, x)
- if vsplit is a leaf
- then Check if the point stored at vsplit must
be reported. - else (Follow the path to x and report the
points in subtrees right of the path) - v ? left(vsplit)
- while v is not a leaf
- do if x xv
- then ReportSubtree(right(
v)) - v ? left(v)
- else v ? right(v)
- Check if the point stored at the leaf
v must be reported. - Similarly, follow the path to x,
report the points in subtrees left of
the path, and check if the point stored at the
leaf where the path ends must be reported.
141-Dimensional range searching
- 1DRangeQuery(T, x, x)
- ? Input. A binary search tree T and a range x
x. - ? Output. All points stored in T that lie in the
range. - vsplit ? FindSplitNode(T, x, x)
- if vsplit is a leaf
- then Check if the point stored at vsplit must
be reported. - else (Follow the path to x and report the
points in subtrees right of the path) - v ? left(vsplit)
- while v is not a leaf
- do if x xv
- then ReportSubtree(right(
v)) - v ? left(v)
- else v ? right(v)
- Check if the point stored at the leaf
v must be reported. - Similarly, follow the path to x,
report the points in subtrees left of the
path, and check if the point stored at the leaf
where the path ends must be reported.
- Correctness?
- Need to show two things
-
- every reported point lies in the query range
- every point in the query range is reported.
151-Dimensional range searching
- 1DRangeQuery(T, x, x)
- ? Input. A binary search tree T and a range x
x. - ? Output. All points stored in T that lie in the
range. - vsplit ? FindSplitNode(T, x, x)
- if vsplit is a leaf
- then Check if the point stored at vsplit must
be reported. - else (Follow the path to x and report the
points in subtrees right of the path) - v ? left(vsplit)
- while v is not a leaf
- do if x xv
- then ReportSubtree(right(
v)) - v ? left(v)
- else v ? right(v)
- Check if the point stored at the leaf
v must be reported. - Similarly, follow the path to x,
report the points in subtrees left of the
path, and check if the point stored at the leaf
where the path ends must be reported.
- Query time?
- ReportSubtree O(1 reported points)
- ? total query time O(log n reported points)
- Storage?
O(n)
162-Dimensional range searching
- Pp1, p2, , pn set of points in the plane
- Query given a query rectangle x x x y
y report all pi ? P with pi ? x x
x y y , that is, px ? x x and py ? y
y - ? a 2-dimensional range query is composed of
two 1-dimensional sub-queries - How can we generalize our 1-dimensionalsolution
to 2 dimensions?
for now no two points have the same
x-coordinate, no two points have the same
y-coordinate
17Back to one dimension
3
10
19
23
30
37
49
59
62
70
80
89
93
98
18Back to one dimension
70
19
93
59
30
3
49
89
80
19
3
10
23
30
37
59
70
62
93
98
3
10
19
23
30
37
49
59
62
70
80
89
93
98
19And now in two dimensions
- Split alternating on x- and y-coordinate
l1
l1
l5
l7
p4
p9
p5
l2
l3
p10
l2
p2
l4
l5
l6
l7
l3
l8
p7
p1
p8
l8
l9
p3
p3
p4
p5
p8
p9
p10
p6
l9
p1
p2
p6
p7
l4
l6
2-dimensional kd-tree
20Kd-trees
- BuildKDTree(P, depth)
- ? Input. A set of points P and the current depth.
- ? Output. The root of a kd-tree storing P.
- if P contains only one point
- then return a leaf storing this point
- else if depth is even
- then split P into two subsets with
a vertical line l through the
median x-coordinate of the points in P. Let
P1 be the set of points to the left or on l,
and let P2 be the set of points to the right
of l. - else split P into two subsets with
a horizontal line l through the
median y-coordinate of the points in P. Let
P1 be the set of points below or on l, and let
P2 be the set of points above l. - vleft ? BuildKdTree(P1, depth1)
- vright ? BuildKdTree(P2, depth1)
- Create a node v storing l, make vleft the left
child of v, and make vright the right child of
v - return v
21Kd-trees
- BuildKDTree(P, depth)
- ? Input. A set of points P and the current depth.
- ? Output. The root of a kd-tree storing P.
- if P contains only one point
- then return a leaf storing this point
- else if depth is even
- then split P into two subsets with
a vertical line l through the
median x-coordinate of the points
in P. Let P1 be the set of
points to the left or on l, and let P2 be the set
of points to the right of l. - else split P into two subsets with
a horizontal line l through the
median y-coordinate of the
points in P. Let P1 be the set of
points below or on l, and let P2 be the set of
points above l. - vleft ? BuildKdTree(P1, depth1)
- vright ? BuildKdTree(P2, depth1)
- Create a node v storing l, make vleft the left
child of v, and make vright the right child of
v - return v
- Running time?
- ? T(n) O(n log n)
presort to avoid linear time median finding
22Kd-trees
- BuildKDTree(P, depth)
- ? Input. A set of points P and the current depth.
- ? Output. The root of a kd-tree storing P.
- if P contains only one point
- then return a leaf storing this point
- else if depth is even
- then split P into two subsets with
a vertical line l through the
median x-coordinate of the points
in P. Let P1 be the set of
points to the left or on l, and let P2 be the set
of points to the right of l. - else split P into two subsets with
a horizontal line l through the
median y-coordinate of the
points in P. Let P1 be the set of
points below or on l, and let P2 be the set of
points above l. - vleft ? BuildKdTree(P1, depth1)
- vright ? BuildKdTree(P2, depth1)
- Create a node v storing l, make vleft the left
child of v, and make vright the right child of
v - return v
- Storage?
O(n)
23Querying a kd-tree
- Each node v corresponds to a region region(v).
- All points which are stored in the subtree rooted
at v lie in region(v). - ? 1. if a region is contained in query
rectangle, report all points in region. - 2. if region is disjoint from query rectangle,
report nothing. - 3. if region intersects query rectangle, refine
search (test children or point stored
in region if no children)
24Querying a kd-tree
p4
p12
p5
p13
p2
p8
p1
p11
p10
p3
p3
p4
p5
p11
p12
p13
p7
p9
p6
p1
p2
p6
Disclaimer This tree cannot have been
constructed by BuildKdTree
p7
p8
p9
p10
25Querying a kd-tree
- SearchKdTree(v, R)
- ? Input. The root of (a subtree of) a kd-tree,
and a range R. - ? Output. All points at leaves below v that lie
in the range. - if v is a leaf
- then report the point stored at v if it lies
in R - else if region(left(v)) is fully contained in
R - then ReportSubtree(left(v))
- else if region(left(v)) intersects
R - then SearchKdTree(left(v
), R) - if region(right(v)) is fully
contained in R - then ReportSubtree(right(v))
- else if region(right(v))
intersects R - then SearchKdTree(right(
v), R) - Query time?
26Querying a kd-tree Analysis
- Time to traverse subtree and report points stored
in leaves is linear in number of leaves - ? ReportSubtree takes O(k) time, k total number
of reported points - need to bound number of nodes visited that are
not in the traversed subtrees (grey nodes) - the query range properly intersects the region
of each such node
We are only interested in an upper bound How
many regions can a vertical line intersect?
27Querying a kd-tree Analysis
- Question How many regions can a vertical line
intersect? - Q(n) number of intersected regions
- Answer 1
- Answer 2
- Master theorem ? Q(n) O(vn)
Q(n) 1 Q(n/2)
28KD-trees
- TheoremA kd-tree for a set of n points in the
plane uses O(n) storage and can be built in O(n
log n) time. A rectangular range query on the
kd-tree takes O(vn k) time, where k is the
number of reported points.
If the number k of reported points is small, then
the query time O(vn k) is relatively high.
Can we do better?
Trade storage for query time
29Back to 1 dimension (again)
- A 1DRangeQuery with x x gives us all points
whose x-coordinates lie in the range x x
y y. - These points are stored in O(log n) subtrees.
- Canonical subset of node vpoints stored in the
leaves of the subtree rooted at v - Idea store canonical subsets in binarysearch
tree on y-coordinate
30Range trees
- Range tree
- The main tree is a balanced binary search tree T
built on the x-coordinate of the points in P. - For any internal or leaf node ? in T, the
canonical subset P(?) is stored in a balanced
binary search tree Tassoc(?) on the y-coordinate
of the points. The node ? stores a pointer to the
root of Tassoc(?), which is called the associated
structure of ?.
31Range trees
- Build2DRangeTree(P)
- ? Input. A set P of points in the plane.
- ? Output. The root of a 2-dimensional range tree.
- Construct the associated structure Build a
binary search tree Tassoc on the set Py of
y-coordinates of the points in P. Store at the
leaves of Tassoc not just the y-coordinates of
the points in Py, but the points themselves. - if P contains only one points
- then create a leaf v storing this point, and
make Tassoc the associated structure
of v - else split P into two subsets one subset
Pleft contains the points with x-
coordinate less than or equal to xmid, the median
x-coordinate, and the other subset
Pright contains the points with x-coordinate
larger than xmid - vleft ? Build2DRangeTree(Pleft)
- vright ? Build2DRangeTree(Pright)
- create a node v storing xmid, make
vleft the left child of v, make vright
the right child of v, and make Tassoc the
associated structure of v - return v
32Range trees
- Build2DRangeTree(P)
- ? Input. A set P of points in the plane.
- ? Output. The root of a 2-dimensional range tree.
- Construct the associated structure Build a
binary search tree Tassoc on the set Py of
y-coordinates of the points in P. Store at the
leaves of Tassoc not just the y-coordinates of
the points in Py, but the points themselves. - if P contains only one points
- then create a leaf v storing this point, and
make Tassoc the associated structure
of v - else split P into two subsets one subset
Pleft contains the points with x-
coordinate less than or equal to xmid, the median
x-coordinate, and the other subset
Pright contains the points with x-coordinate
larger than xmid - vleft ? Build2DRangeTree(Pleft)
- vright ? Build2DRangeTree(Pright)
- create a node v storing xmid, make
vleft the left child of v, make vright
the right child of v, and make Tassoc the
associated structure of v - return v
- Running time?
O(n log n)
presort to built binary search trees in linear
time
33Range trees Storage
- LemmaA range tree on a set of n points in the
plane requires O(n log n) storage. - Proof
- each point is stored only once per level
- storage for associated structures is linear in
number of points - there are O(log n) levels
-
34Querying a range tree
- 2DRangeQuery(T, x x y y)
- ? Input. A 2-dimensional range tree T and a range
x x y y. - ? Output. All points in T that lie in the range.
- vsplit ? FindSplitNode(T, x, x)
- if vsplit is a leaf
- then check if the point stored at vsplit must
be reported - else (Follow the path to x and call
1DRangeQuery on the subtrees right
of the path) - v ? left(vsplit)
- while v is not a leaf
- do if x xv
- then 1DRangeQuery(Tassoc(
right(v)), y y) - v ? left(v)
- else v ? right(v)
- Check if the point stored at v must
be reported - Similarly, follow the path from
right(vsplit) to x, call 1DRangeQuery with
the range y y on the associated
structures of subtrees left of the
path, and check if the point stored at the leaf
where the path ends must be reported.
35Querying a range tree
- 2DRangeQuery(T, x x y y)
- ? Input. A 2-dimensional range tree T and a range
x x y y. - ? Output. All points in T that lie in the range.
- vsplit ? FindSplitNode(T, x, x)
- if vsplit is a leaf
- then check if the point stored at vsplit must
be reported - else (Follow the path to x and call
1DRangeQuery on the subtrees right
of the path) - v ? left(vsplit)
- while v is not a leaf
- do if x xv
- then 1DRangeQuery(Tassoc(
right(v)), y y) - v ? left(v)
- else v ? right(v)
- Check if the point stored at v must
be reported - Similarly, follow the path from
right(vsplit) to x, call 1DRangeQuery with
the range y y on the associated
structures of subtrees left of the
path, and check if the point stored at the leaf
where the path ends must be
reported. - Running time?
O(log2 n k)
36Range trees
- TheoremA range tree for a set of n points in the
plane uses O(n log n) storage and can be built in
O(n log n) time. A rectangular range query on the
range tree takes O(log2 n k) time, where k is
the number of reported points. - Conclusion
37Tutorials
- This week
- No tutorials!
- Wednesday tutorial for Assignment 7 on Wednesday
May 6.