2IL05 Data Structures - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

2IL05 Data Structures

Description:

Make-Set(x): creates a new set whose only member is x ... Total running time for m operations, of which n are Make-Set: ... Proof: Make-Set and Find-Set cost T ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 35
Provided by: bettinas
Category:
Tags: 2il05 | data | make | structures

less

Transcript and Presenter's Notes

Title: 2IL05 Data Structures


1
2IL05 Data Structures
  • Spring 2009Lecture 11 Data Structures for
    Disjoint Sets

2
Abstract data type
  • Abstract Data Type (ADT)A set of data values and
    associated operations that are precisely
    specified independent of any particular
    implementation.
  • Dictionary, stack, queue, priority queue, set,
    bag

3
Dynamic sets
  • Dynamic setsSets that can grow, shrink, or
    otherwise change over time.
  • Two types of operations
  • queries return information about the set
  • modifying operations change the set
  • Common queries
  • Search, Minimum, Maximum, Successor, Predecessor
  • Common modifying operations
  • Insert, Delete

4
Union-find structure
  • Union-Find Structure Stores a collection of
    disjoint dynamic sets.
  • Operations
  • Make-Set(x) creates a new set whose only member
    is x
  • Union(x, y) unites the dynamic sets that
    contain x and y
  • Find-Set(x) finds the set that contains x

5
Union-find structure
  • Union-Find Structure Stores a collection of
    disjoint dynamic sets.
  • every set Si is identified by a
    representative(It doesnt matter which element
    is the representative, but if we ask for it
    twice, without modifying the set, we need to get
    the same answer both times.)
  • Operations
  • Make-Set(x) creates a new set whose only member
    is x
  • (x is the representative.)
  • Union(x, y) unites the dynamic sets Sx and Sy
    that contain x and y
  • (Representative of new set is any member of Sx
    or Sy, often one of their representatives.
    Destroys Sx and Sy since sets must be disjoint.)
  • Find-Set(x) finds the set that contains x
  • (Returns the representative of the set
    containing x, assumes that x is an element
    of one of the sets.)

6
Analysis of union-find structures
  • Union-find structures are often used as an
    auxiliary data structure by algorithms
  • ? total running time over all operations is
    more important than worst case running
    time for each operation
  • Analysis in terms of
  • n of elements Make-Set operations
  • m total of operations (incl. Make-Set)

7
Example application connected components
  • Maintain the connected components of a graph G
    (V, E) under edge insertions.
  • Connected-Components(V, E)
  • for each vertex v ? V
  • do Make-Set(v)
  • for each edge (u, v) ? E
  • do Insert-Edge(u, v)
  • Same-Component(u, v)
  • if Find-Set(u) Find-Set(v)
  • then return True
  • else return False
  • Insert-Edge(u, v)
  • if Find-Set(u) ? Find-Set(v)
  • then Union(u, v)

8
Data Structures for union-find Solution 1
  • Store every set Si in a doubly-linked lists
  • representative first element of the list
  • the prev-pointer of the first element points to
    the last element
  • x is the representative if nextprevx NIL
  • Disclaimer This is not quite the same solution
    as in Chapter 21.2 of the text
    book

9
Solution 1 Make-Set and Find-Set
  • Make-Set(x)
  • prevx ? x
  • nextx ? NIL
  • FindSet(x)
  • a ? x
  • while nextpreva ? NIL
  • do a ? preva
  • return a

Note x is a pointer to an element in the list
and hence we do not need to search.
10
Solution 1 Union
  • Union(x, y)
  • ? Assumes x and y are elements of different
    sets.
  • a ? Find-Set(x) b ? Find-Set(y)
  • append the list of b onto the end of the list of
    a.

a
x
y
b
11
Analysis Solution 1
  • Make-Set(x) O(1)
  • Find-Set(x) O(size of set that contains x)
  • Union(x, y) 2 Find-Set O(1) O(size of
    both sets)
  • Total running time for m operations, of which n
    are Make-Set
  • each set has size n ? total running time
    O(nm)
  • Yes Make-Set(x1), , Make-Set(xn)
  • Union(x2,x1), Union(x3,x1), ,
    Union(xn,x1)
  • Find-Set(x1), Find-Set(x1),
    Find-Set(x1),

Is this possible at all?!?
12
Problems with Solution 1
  • Problem Find-Set takes too long
  • Solution 2
  • replace prevx pointer with a repx pointer to
    the representative
  • the rep-pointer of the representative points to
    the last element

13
Solution 2 Make-Set and Find-Set
  • Make-Set(x)
  • repx ? x
  • nextx ? NIL
  • Find-Set can now be executed in O(1) time
  • FindSet(x)
  • if nextrepx NIL
  • then return x
  • else return repx

14
Solution 2 Union
  • Union(x, y)
  • ? Assumes x and y are elements of different
    sets.
  • a ? Find-Set(x) b ? Find-Set(y)
  • append the list of b onto the end of the list of
    a.
  • update all rep-pointers
  • Running time?

O(size of set that contains y)
15
Analysis Solution 2
  • Make-Set(x) O(1)
  • Find-Set(x) O(1)
  • Union(x, y) O(size of set that contains y)
  • Total running time for m operations, of which n
    are Make-Set
  • Lets check the worst case example for Solution
    1

16
Worst case for Solution 1
  • Make-Set(x1), , Make-Set(xn)
  • Union(x2,x1), Union(x3,x1), , Union(xn,x1)
  • Find-Set(x1), Find-Set(x1), Find-Set(x1),

T(n) ?2in T(i) T(n2) T(m-2n) Total T(mn2)
Make-Set(x) and Find-Set(x) O(1) Union(x, y)
O(size of set that contains y)
17
Problems with Solution 2
  • What is the problem?
  • appending x3 , x2 , x1 onto x4 was not a
    great idea
  • Solution 3 Always append the shorter list onto
    the longer list ? less rep-pointers need
    to be updated

union-by-size
18
Solution 3
  • Solution 3 The same as Solution 2, but
  • store with each list its length (this can be
    easily maintained)
  • Union(x, y) always appends the shorter onto the
    longer list
  • TheoremA sequence of m operations, of which n
    are Make-Set, takes T(m n log n) time in the
    worst case.
  • Proof Make-Set and Find-Set cost T(1) per
    operation ? O(m) in total.
  • Time for all Union operations
  • O(total number of times that a rep-pointer
    was moved)
  • ?x (number of times that repx was moved)
  • ?x O(log n) O(n log n)
  • Can it really be O(m n log n)?

But we can do even better
Yes.
19
Solution 4
  • Solution 1append one list onto the other
  • New ideaappend one list directly onto (under)
    the representative of the other

20
Solution 4
  • New ideaappend one list directly onto (under)
    the representative of the other
  • next-pointers are not needed anymore
  • the rep-pointer of the representative points to
    the representative

a sort of tree structure
21
Solution 4 The data structure
  • Each set is stored in a tree nodes have only a
    pointer px that points to their parent.
  • The root is the representative of the set the
    parent-pointer px of the root points to the
    root.
  • We need to know the height of each tree to
    attach the smaller tree to the larger ?
  • Each node x has a field rankx, which is an
    upper bound for the height of x.
  • height of x the number of edges in the
    longest path between x and a descendant leaf

union-by-rank
22
Solution 4 Make-Set
  • Make-Set(x)
  • px ? x
  • rankx ? 0

23
Solution 4 Union
3
1
2
  • Union(x, y)
  • a ? Find-Set(x) b ? Find-Set(y)
  • if ranka gt rankb
  • then pb ? a
  • else if ranka lt rankb
  • then pa ? b
  • else pa ? b rankb ? rankb
    1

24
Solution 4 Find-Set
  • Find-Set(x)
  • if x ? px
  • then return Find-Set(px)
  • else return x
  • Path compressionFind path nodes visited
    during Find-Set on the trip to the rootMake all
    nodes on the find path direct children of the
    root.
  • Find-Set(x)
  • if x ? px
  • then px ? Find-Set(px)
  • return px

25
Analysis Solution 4
  • Lemma( elements in the tree rooted at x)
    2rankx
  • Proof Induction on r rankx
  • Base case r 0
  • elements 1 20 ?
  • Inductive step r gt 0
  • a node x with rank r is created by joining two
    trees with roots of rank r-1
  • ? elements in new subtree rooted at x 2
    2r-1 2r
  • This immediately implies rankx log n

26
Analysis Solution 4
  • TheoremA sequence of m operations, of which n
    are Make-Set, takes T(m log n) time in the worst
    case.
  • Proof
  • rankx log n
  • the rank of nodes on the find path increases by
    at least one in every step
  • ? maximal length of find path maximal rank
    log n
  • ? Find-Set takes O(log n) time
  • Make-Set and Union (excl. Find-Set) both take
    O(1) time
  • But Solution 3 works in O(m n log n) ?!?

27
Analysis Solution 4
  • TheoremA sequence of m operations, of which n
    are Make-Set, takes O(m a(n)) time in the worst
    case.
  • a(n) is a function that grows extremely slow
  • a(n) log n
  • Proof is somewhat complicated ...
  • we will prove O(m log n)

Number of times that have to take a log before
you get below 2 log 2 1, log 22 2,
log 24 3, log 216 4, log
265,536 5
28
Analysis Solution 4
  • TheoremA sequence of m operations, of which n
    are Make-Set, takes O(m log n) time in the
    worst case.
  • Proof
  • Make-Set and Union (excl. Find-Set) both take
    O(1) time
  • there are n Make-Set and at most n-1 Union
    operations
  • ? in total O(n) time for all Make-Set and Union
    (excl. Find-Set) operations
  • remains to show m Find-Set operations can be
    executed in O(m log n) time

29
The log function
  • Define function t N ? N as
  • log n min i t(i) n
  • Note log t(i) i

30
Rank groups
  • Divide nodes into rank groups node x is in rank
    group g if g log (rankx)
  • ? t(g-1) lt rankx t(g)
  • Lemma ( nodes in rank group g) n / t(g)
  • Proof
  • ( nodes in rank group g)
  • ?t(g-1)1 r t(g) ( nodes with rank r)
  • ?t(g-1)1 r t(g) n / 2r
  • n / 2t(g-1)1 ?0 r t(g) t(g-1) -1
    1 / 2r
  • lt n / 2t(g-1)1 2
  • n / 2t(g-1) n / t(g)
  • Lemma( elements in the tree rooted at x)
    2rankx
  • ? ( nodes with rank r) lt n / 2r

31
Analysis Solution 4 Find-Set
  • Lemmam Find-Set operations can be executed in
    O(m logn) time.
  • Proof Idea bound parent pointers on all find
    paths
  • Three cases
  • (i) pointer to root
  • ? 2 per find path ? O(m) in total ?
  • (ii) pointer from node y to py with
    group(py) gt group(y)
  • highest rank is log n
  • ? groups log(log n) logn -1
  • ? at most logn -1 per find path ? O(m logn)
    in total ?
  • (iii) pointer from node y to py with
    group(py) group(y)

32
Analysis Solution 4 Find-Set
  • (iii) pointer from node y to py with
    group(py) group(y)
  • after following the pointer py, y will get a
    new parent because of path compression
  • ranks are monotonically increasing
  • ? ranknew parent gt rankprevious parent
  • if the new parent is in a higher group
  • ? y will never be in case (iii) again (the
    rank of a node that is not a root never changes)
  • Q How often can case (iii) occur for one node y?
  • A At most different ranks in ys rank group
  • Total for case (iii)
  • ?1glogn-1 ( nodes y with group(y) g) (
    ranks in group g)
  • ?1glogn-1 (n / t(g)) (t(g) t(g-1))
    O(n logn)

33
Analysis Solution 4
  • TheoremIf we implement a union-find data
    structure with a collection of trees, using the
    union-by-rank heuristic and the path-compression
    heuristic, then a sequence of m operations, of
    which n are Make-Set, takes O(m logn) time in
    the worst case.

34
Tutorials
  • This week
  • All tutorials!
Write a Comment
User Comments (0)
About PowerShow.com