2IL05 Data Structures - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

2IL05 Data Structures

Description:

A set of points P and the current depth. Output. The root of a kd-tree storing P. ... else if depth is even ... (P1, depth 1) vright BuildKdTree(P2, depth 1) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 38
Provided by: bettinas
Category:

less

Transcript and Presenter's Notes

Title: 2IL05 Data Structures


1
2IL05 Data Structures
  • Spring 2009Lecture 10 Range Searching

2
Augmenting data structures
  • Methodology for augmenting a data structure
  • Choose an underlying data structure.
  • Determine additional information to maintain.
  • Verify that we can maintain additional
    information for existing data structure
    operations.
  • Develop new operations.
  • You dont need to do these steps in strict order!
  • Red-black trees are very well suited to
    augmentation

3
Augmenting red-black trees
  • TheoremAugment a R-B tree with field f, where
    fx depends only on information in x, leftx,
    and rightx (including fleftx and
    frightx). Then can maintain values of f in
    all nodes during insert and delete without
    affecting O(log n) performance.
  • When we alter information in x, changes
    propagate only upward on the search path for x
  • Examples
  • OS-tree new operations OS-Select and OS-Rank
  • Interval-tree new operation Interval-Search

4
Range Searching
5
Application Database queries
  • Example Database for personnel administration
    (name, address, date of birth, salary, )
  • Query Report all employees born between 1950
    and 1955 who earn between 3000 and 4000 per
    month.

More parameters? Report all employees born
between 1950 and 1955 who earn between 3000 and
4000 per month and have between two and four
children. ? more dimensions
6
Application Database queries
  • Report all employees born between 1950 and 1955
    who earn between 3000 and 4000 per month and
    have between two and four children.

Rectangular range query or orthogonal range query
7
1-Dimensional range searching
  • Pp1, p2, , pn set of points on the real line
  • Query given a query interval x x report all
    pi ? P with pi ? x x.
  • Solution Use a balanced binary search tree T.
  • leaves of T store the points pi
  • internal nodes store splitting values (node v
    stores value xv)

8
1-Dimensional range searching
  • Query x x ? search with x and x ? end
    in two leaves µ and µ
  • Report 1. all leaves between µ and µ
  • 2. possibly points stored at µ and µ

18 77
49
80
23
62
10
37
89
70
19
93
59
30
3
49
89
80
19
3
10
23
30
37
59
70
62
93
98
µ
µ
9
1-Dimensional range searching
  • Query x x ? search with x and x ? end
    in two leaves µ and µ
  • Report 1. all leaves between µ and µ
  • 2. possibly points stored at µ and µ

18 77
49
23
80
How do we find all leaves between µ and µ?
10
62
37
89
19
70
93
59
30
3
49
89
70
62
59
37
30
23
19
80
3
10
93
98
µ
µ
10
1-Dimensional range searching
  • How do we find all leaves between µ and µ ?
  • Solution They are the leaves of the subtrees
    rooted at nodes v in between the two search
    paths whose parents are on the search paths.
  • ? we need to find the node vsplit where the
    search paths split

11
1-Dimensional range searching
  • FindSplitNode(T, x, x)
  • ? Input. A tree T and two values x and x with x
    x
  • ? Output. The node v where the paths to x and x
    split, or the leaf where both paths end.
  • v ? root(T)
  • while v is not a leaf and (x xv or x gt xv)
  • do if x xv
  • then v ? left(v)
  • else v ? right(v)
  • return v

12
1-Dimensional range searching
  • Starting from vsplit follow the search path to x.
  • when the paths goes left, report all leaves in
    the right subtree
  • check if µ ? x x
  • Starting from vsplit follow the search path to
    x.
  • when the paths goes right, report all leaves in
    the left subtree
  • check if µ ? x x

13
1-Dimensional range searching
  • 1DRangeQuery(T, x x)
  • ? Input. A binary search tree T and a range x
    x.
  • ? Output. All points stored in T that lie in the
    range.
  • vsplit ? FindSplitNode(T, x, x)
  • if vsplit is a leaf
  • then Check if the point stored at vsplit must
    be reported.
  • else (Follow the path to x and report the
    points in subtrees right of the path)
  • v ? left(vsplit)
  • while v is not a leaf
  • do if x xv
  • then ReportSubtree(right(
    v))
  • v ? left(v)
  • else v ? right(v)
  • Check if the point stored at the leaf
    v must be reported.
  • Similarly, follow the path to x,
    report the points in subtrees left of
    the path, and check if the point stored at the
    leaf where the path ends must be reported.

14
1-Dimensional range searching
  • 1DRangeQuery(T, x, x)
  • ? Input. A binary search tree T and a range x
    x.
  • ? Output. All points stored in T that lie in the
    range.
  • vsplit ? FindSplitNode(T, x, x)
  • if vsplit is a leaf
  • then Check if the point stored at vsplit must
    be reported.
  • else (Follow the path to x and report the
    points in subtrees right of the path)
  • v ? left(vsplit)
  • while v is not a leaf
  • do if x xv
  • then ReportSubtree(right(
    v))
  • v ? left(v)
  • else v ? right(v)
  • Check if the point stored at the leaf
    v must be reported.
  • Similarly, follow the path to x,
    report the points in subtrees left of the
    path, and check if the point stored at the leaf
    where the path ends must be reported.
  • Correctness?
  • Need to show two things
  • every reported point lies in the query range
  • every point in the query range is reported.

15
1-Dimensional range searching
  • 1DRangeQuery(T, x, x)
  • ? Input. A binary search tree T and a range x
    x.
  • ? Output. All points stored in T that lie in the
    range.
  • vsplit ? FindSplitNode(T, x, x)
  • if vsplit is a leaf
  • then Check if the point stored at vsplit must
    be reported.
  • else (Follow the path to x and report the
    points in subtrees right of the path)
  • v ? left(vsplit)
  • while v is not a leaf
  • do if x xv
  • then ReportSubtree(right(
    v))
  • v ? left(v)
  • else v ? right(v)
  • Check if the point stored at the leaf
    v must be reported.
  • Similarly, follow the path to x,
    report the points in subtrees left of the
    path, and check if the point stored at the leaf
    where the path ends must be reported.
  • Query time?
  • ReportSubtree O(1 reported points)
  • ? total query time O(log n reported points)
  • Storage?

O(n)
16
2-Dimensional range searching
  • Pp1, p2, , pn set of points in the plane
  • Query given a query rectangle x x x y
    y report all pi ? P with pi ? x x
    x y y , that is, px ? x x and py ? y
    y
  • ? a 2-dimensional range query is composed of
    two 1-dimensional sub-queries
  • How can we generalize our 1-dimensionalsolution
    to 2 dimensions?

for now no two points have the same
x-coordinate, no two points have the same
y-coordinate
17
Back to one dimension
3
10
19
23
30
37
49
59
62
70
80
89
93
98
18
Back to one dimension
70
19
93
59
30
3
49
89
80
19
3
10
23
30
37
59
70
62
93
98
3
10
19
23
30
37
49
59
62
70
80
89
93
98
19
And now in two dimensions
  • Split alternating on x- and y-coordinate

l1
l1
l5
l7
p4
p9
p5
l2
l3
p10
l2
p2
l4
l5
l6
l7
l3
l8
p7
p1
p8
l8
l9
p3
p3
p4
p5
p8
p9
p10
p6
l9
p1
p2
p6
p7
l4
l6
2-dimensional kd-tree
20
Kd-trees
  • BuildKDTree(P, depth)
  • ? Input. A set of points P and the current depth.
  • ? Output. The root of a kd-tree storing P.
  • if P contains only one point
  • then return a leaf storing this point
  • else if depth is even
  • then split P into two subsets with
    a vertical line l through the
    median x-coordinate of the points in P. Let
    P1 be the set of points to the left or on l,
    and let P2 be the set of points to the right
    of l.
  • else split P into two subsets with
    a horizontal line l through the
    median y-coordinate of the points in P. Let
    P1 be the set of points below or on l, and let
    P2 be the set of points above l.
  • vleft ? BuildKdTree(P1, depth1)
  • vright ? BuildKdTree(P2, depth1)
  • Create a node v storing l, make vleft the left
    child of v, and make vright the right child of
    v
  • return v

21
Kd-trees
  • BuildKDTree(P, depth)
  • ? Input. A set of points P and the current depth.
  • ? Output. The root of a kd-tree storing P.
  • if P contains only one point
  • then return a leaf storing this point
  • else if depth is even
  • then split P into two subsets with
    a vertical line l through the
    median x-coordinate of the points
    in P. Let P1 be the set of
    points to the left or on l, and let P2 be the set
    of points to the right of l.
  • else split P into two subsets with
    a horizontal line l through the
    median y-coordinate of the
    points in P. Let P1 be the set of
    points below or on l, and let P2 be the set of
    points above l.
  • vleft ? BuildKdTree(P1, depth1)
  • vright ? BuildKdTree(P2, depth1)
  • Create a node v storing l, make vleft the left
    child of v, and make vright the right child of
    v
  • return v
  • Running time?
  • ? T(n) O(n log n)

presort to avoid linear time median finding
22
Kd-trees
  • BuildKDTree(P, depth)
  • ? Input. A set of points P and the current depth.
  • ? Output. The root of a kd-tree storing P.
  • if P contains only one point
  • then return a leaf storing this point
  • else if depth is even
  • then split P into two subsets with
    a vertical line l through the
    median x-coordinate of the points
    in P. Let P1 be the set of
    points to the left or on l, and let P2 be the set
    of points to the right of l.
  • else split P into two subsets with
    a horizontal line l through the
    median y-coordinate of the
    points in P. Let P1 be the set of
    points below or on l, and let P2 be the set of
    points above l.
  • vleft ? BuildKdTree(P1, depth1)
  • vright ? BuildKdTree(P2, depth1)
  • Create a node v storing l, make vleft the left
    child of v, and make vright the right child of
    v
  • return v
  • Storage?

O(n)
23
Querying a kd-tree
  • Each node v corresponds to a region region(v).
  • All points which are stored in the subtree rooted
    at v lie in region(v).
  • ? 1. if a region is contained in query
    rectangle, report all points in region.
  • 2. if region is disjoint from query rectangle,
    report nothing.
  • 3. if region intersects query rectangle, refine
    search (test children or point stored
    in region if no children)

24
Querying a kd-tree
p4
p12
p5
p13
p2
p8
p1
p11
p10
p3
p3
p4
p5
p11
p12
p13
p7
p9
p6
p1
p2
p6
Disclaimer This tree cannot have been
constructed by BuildKdTree
p7
p8
p9
p10
25
Querying a kd-tree
  • SearchKdTree(v, R)
  • ? Input. The root of (a subtree of) a kd-tree,
    and a range R.
  • ? Output. All points at leaves below v that lie
    in the range.
  • if v is a leaf
  • then report the point stored at v if it lies
    in R
  • else if region(left(v)) is fully contained in
    R
  • then ReportSubtree(left(v))
  • else if region(left(v)) intersects
    R
  • then SearchKdTree(left(v
    ), R)
  • if region(right(v)) is fully
    contained in R
  • then ReportSubtree(right(v))
  • else if region(right(v))
    intersects R
  • then SearchKdTree(right(
    v), R)
  • Query time?

26
Querying a kd-tree Analysis
  • Time to traverse subtree and report points stored
    in leaves is linear in number of leaves
  • ? ReportSubtree takes O(k) time, k total number
    of reported points
  • need to bound number of nodes visited that are
    not in the traversed subtrees (grey nodes)
  • the query range properly intersects the region
    of each such node

We are only interested in an upper bound How
many regions can a vertical line intersect?
27
Querying a kd-tree Analysis
  • Question How many regions can a vertical line
    intersect?
  • Q(n) number of intersected regions
  • Answer 1
  • Answer 2
  • Master theorem ? Q(n) O(vn)

Q(n) 1 Q(n/2)
28
KD-trees
  • TheoremA kd-tree for a set of n points in the
    plane uses O(n) storage and can be built in O(n
    log n) time. A rectangular range query on the
    kd-tree takes O(vn k) time, where k is the
    number of reported points.

If the number k of reported points is small, then
the query time O(vn k) is relatively high.
Can we do better?
Trade storage for query time
29
Back to 1 dimension (again)
  • A 1DRangeQuery with x x gives us all points
    whose x-coordinates lie in the range x x
    y y.
  • These points are stored in O(log n) subtrees.
  • Canonical subset of node vpoints stored in the
    leaves of the subtree rooted at v
  • Idea store canonical subsets in binarysearch
    tree on y-coordinate

30
Range trees
  • Range tree
  • The main tree is a balanced binary search tree T
    built on the x-coordinate of the points in P.
  • For any internal or leaf node ? in T, the
    canonical subset P(?) is stored in a balanced
    binary search tree Tassoc(?) on the y-coordinate
    of the points. The node ? stores a pointer to the
    root of Tassoc(?), which is called the associated
    structure of ?.

31
Range trees
  • Build2DRangeTree(P)
  • ? Input. A set P of points in the plane.
  • ? Output. The root of a 2-dimensional range tree.
  • Construct the associated structure Build a
    binary search tree Tassoc on the set Py of
    y-coordinates of the points in P. Store at the
    leaves of Tassoc not just the y-coordinates of
    the points in Py, but the points themselves.
  • if P contains only one points
  • then create a leaf v storing this point, and
    make Tassoc the associated structure
    of v
  • else split P into two subsets one subset
    Pleft contains the points with x-
    coordinate less than or equal to xmid, the median
    x-coordinate, and the other subset
    Pright contains the points with x-coordinate
    larger than xmid
  • vleft ? Build2DRangeTree(Pleft)
  • vright ? Build2DRangeTree(Pright)
  • create a node v storing xmid, make
    vleft the left child of v, make vright
    the right child of v, and make Tassoc the
    associated structure of v
  • return v

32
Range trees
  • Build2DRangeTree(P)
  • ? Input. A set P of points in the plane.
  • ? Output. The root of a 2-dimensional range tree.
  • Construct the associated structure Build a
    binary search tree Tassoc on the set Py of
    y-coordinates of the points in P. Store at the
    leaves of Tassoc not just the y-coordinates of
    the points in Py, but the points themselves.
  • if P contains only one points
  • then create a leaf v storing this point, and
    make Tassoc the associated structure
    of v
  • else split P into two subsets one subset
    Pleft contains the points with x-
    coordinate less than or equal to xmid, the median
    x-coordinate, and the other subset
    Pright contains the points with x-coordinate
    larger than xmid
  • vleft ? Build2DRangeTree(Pleft)
  • vright ? Build2DRangeTree(Pright)
  • create a node v storing xmid, make
    vleft the left child of v, make vright
    the right child of v, and make Tassoc the
    associated structure of v
  • return v
  • Running time?

O(n log n)
presort to built binary search trees in linear
time
33
Range trees Storage
  • LemmaA range tree on a set of n points in the
    plane requires O(n log n) storage.
  • Proof
  • each point is stored only once per level
  • storage for associated structures is linear in
    number of points
  • there are O(log n) levels

34
Querying a range tree
  • 2DRangeQuery(T, x x y y)
  • ? Input. A 2-dimensional range tree T and a range
    x x y y.
  • ? Output. All points in T that lie in the range.
  • vsplit ? FindSplitNode(T, x, x)
  • if vsplit is a leaf
  • then check if the point stored at vsplit must
    be reported
  • else (Follow the path to x and call
    1DRangeQuery on the subtrees right
    of the path)
  • v ? left(vsplit)
  • while v is not a leaf
  • do if x xv
  • then 1DRangeQuery(Tassoc(
    right(v)), y y)
  • v ? left(v)
  • else v ? right(v)
  • Check if the point stored at v must
    be reported
  • Similarly, follow the path from
    right(vsplit) to x, call 1DRangeQuery with
    the range y y on the associated
    structures of subtrees left of the
    path, and check if the point stored at the leaf
    where the path ends must be reported.

35
Querying a range tree
  • 2DRangeQuery(T, x x y y)
  • ? Input. A 2-dimensional range tree T and a range
    x x y y.
  • ? Output. All points in T that lie in the range.
  • vsplit ? FindSplitNode(T, x, x)
  • if vsplit is a leaf
  • then check if the point stored at vsplit must
    be reported
  • else (Follow the path to x and call
    1DRangeQuery on the subtrees right
    of the path)
  • v ? left(vsplit)
  • while v is not a leaf
  • do if x xv
  • then 1DRangeQuery(Tassoc(
    right(v)), y y)
  • v ? left(v)
  • else v ? right(v)
  • Check if the point stored at v must
    be reported
  • Similarly, follow the path from
    right(vsplit) to x, call 1DRangeQuery with
    the range y y on the associated
    structures of subtrees left of the
    path, and check if the point stored at the leaf
    where the path ends must be
    reported.
  • Running time?

O(log2 n k)
36
Range trees
  • TheoremA range tree for a set of n points in the
    plane uses O(n log n) storage and can be built in
    O(n log n) time. A rectangular range query on the
    range tree takes O(log2 n k) time, where k is
    the number of reported points.
  • Conclusion

37
Tutorials
  • This week
  • No tutorials!
  • Wednesday tutorial for Assignment 7 on Wednesday
    May 6.
Write a Comment
User Comments (0)
About PowerShow.com