Introduction to Programming with Data Structures - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Introduction to Programming with Data Structures

Description:

named for initials of Russian creators ... Add boolean flag to indicate height increase. Add 1/0/-1 balance indicator. 20. CMPSCI 187 ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 51
Provided by: Alha6
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Programming with Data Structures


1
Introduction to Programming with Data Structures
Computer Science 187
Lecture 21 Balanced Trees, AVL Trees, and m-way
Trees Announcements
2
Binary Search Trees
  • Whats the structure of the binary search tree I
    get in each case when I construct a tree from the
    keys
  • 1 , 2 , 3 , 4 , 5 , 6?
  • 6 , 5 , 4 , 3 , 2 , 1?
  • 3 , 2 , 5 , 1 , 4 , 6?

3
Balanced Trees
  • Consider adding 7 chars a, b, c, d, e, f, and g
    to an initially empty binary search tree two
    ways.
  • in order a b c d e f g
  • in random order d f b a e c g

BALANCED
UNBALANCED
The order in which data values arrive can make a
HUGE difference in what the tree looks like.
Search O(n)
Search O(log n)
4
Definition of Balanced
  • balance a tree node attribute representing the
    difference in height between the nodes subtrees.
  • The balance attribute for a balanced binary
    tree is
  • -1, 0, or 1.
  • The definition is recursive and holds for all
    root nodes and their left and right subtrees.
  • Achieving balance is important for minimizing the
    time required for search.
  • We will look at AVL trees but not in a lot of
    detail.

5
Schemes for Balancing Trees
  • AVL trees Adelson-Velskii and Landis1962
  • named for initials of Russian creators
  • uses rotations to ensure heights of child trees
    differ by at most 1
  • 23-Trees Hopcroft 1970
  • similar to 234-tree, but repairs have to move
    back up the tree
  • B-Trees Bayer McCreight 1972
  • Red-Black Trees Bayer1972
  • not the original name
  • Red-black convention relation to 234-trees
    Guibas Stolfi 1978
  • Splay Trees Sleator Tarjan 1983
  • Skip Lists Pugh 1990
  • developed at Cornell

6
AVL Trees
  • AVL trees are binary search trees with a balance
    condition.
  • First simple idea left and right subtrees of
    root must have the same height... but not good
    enough

Balanced but not shallow
  • In an AVL tree, the height of the left and right
    subtree of every node can differ by at most 1.

7
AVL Tree Example
10
0
2
5
16
0
1
2
8
0
1
AVL Tree
Not an AVL Tree
  • Search, insert, etc. Complexity O(h), where h
    is height of the tree.
  • Height of an AVL tree O(log n), where n is the
    number of nodes.

8
Insertion in an AVL Tree
Insert 7
10
10
1
2
1
2
5
16
5
16
1
0
0
1
1
2
8
12
2
8
12
0
0
0
7
1
1
  • Height of tree rooted at 5 does not change.
  • Tree is still balanced

9
Insertion Example 2
Insert 0
10
1
3
5
16
2
0
2
2
8
12
1
1
0
0
  • Inserting a node causes heights in tree to change
  • Tree is no longer balanced

10
When and Where do Heights Change?
10
1
Assume this is the node where an imbalance occurs
5
16
Therefore, the height of these nodes differ by
more than 1.
2
8
12
Insertions here could cause the
imbalance. (anywhere else?)
  • Therefore there are four cases to consider
  • Insertion into the left subtree of the left child
    of 5 Case 1
  • Insertion into the right subtree of the left
    child of 5 Case 2
  • Insertion into the left subtree of the right
    child of 5 Case 3
  • Insertion into the right subtree of the right
    child of 5 Case 4
  • First and fourth are symmetric, second and third
    are symmetric

11
Case 1(right rotation) Single Rotation
root
root
1. Make N1 the root node 2. Make T2 the left
child of n2
Tree balanced
Tree Unbalanced
  • Binary search tree property holds in both trees
  • Balance is achieved in the 'rotated' tree

12
Rotation to the Right
  • Algorithm for rotation (toward the right)
  • Save value of root.left (temp root.left)
  • Set root.left to value of root.left.right
  • Set temp.right to root
  • Set root to temp
  • Algorithm for rotation toward the left is similar
    - you do it.

root
temp
13
Case 4 (left rotation) Single Rotation
n1
n2
1. Make n2 the root node 2. Make T2 the right
child of n1
n2
n1
T1
T3
T2
T1
Tree Unbalanced
Tree balanced
T3
  • Binary search tree property holds in both trees
  • Balance is achieved in the 'rotated' tree

14
Case 2 (left-right) Trouble
  • Single Rotation fails to fix this case
  • Need a more complex manipulation of the tree

15
Case 2 (left-right) Double Rotation(two single
rotations)
Solution 1. Rotate n3 into n1 2. Rotate n3 into
n2
Tree Unbalanced
  • Expand T2 in original tree into a node and two
    subtrees
  • We know that neither n1 nor n2 works as the root
  • Solution is then two single rotations to get n3
    at root

16
Case 2 (left-right) Double Rotation(two single
rotations)
n2
n2
n3
n3
n1
n1
n3
n1
n2
Rotate n3 into n1
Rotate n3 into n2
  • Binary search tree property holds in both trees
  • Balance is achieved in the 'rotated' tree

17
Case 3 (right-left) Double Rotation(two single
rotations)
n2
1. Rotate n3 into n1 2. Rotate n3 into n2
n3
n1
n1
n2
n3
  • Binary search tree property holds in both trees
  • Balance is achieved in the 'rotated' tree

18
Unbalanced Search Trees
  • Left-Left (parent balance is -2, left child
    balance is -1)
  • Rotate right around parent
  • Left-Right (parent balance -2, left child balance
    1)
  • Rotate left around child
  • Rotate right around parent
  • Right-Left (parent balance 2, right child
    balance -1)
  • Rotate right around child
  • Rotate left around parent
  • Right-Right (parent balance 2, right child
    balance 1)
  • Rotate left around parent

19
Book Solution
Add 1/0/-1 balance indicator
Add boolean flag to indicate height increase
20
A new AVL insertKey Method
  • Assume our BinaryNode has a fourth attribute
    called height
  • You make the modification to BinaryNode to
    accomplish this
  • We'll write a new insertKeyNode method to handle
    balancing the tree after insertion
  • Insert pretty much as before
  • After insertion, call balancing routines, each of
    which implements one of our four cases
  • rotateLeft rotateRight (cases 1 and 4)
  • rotateLeftRight rotateRightLeft (cases 2 and 3)
  • Our old method insertKey doesn't change
  • More or less a skeleton - some details will be
    missing

21
Case 1(right) Single RotationrotateRight()
Detach and save T2 Make n1 reference n2 as right
child Attach T2 as left child of n2 Update
heights - here or elsewhere return new root node
n2
n2 is the root of the tree n1 is the left child
of the root
n1
T3
T2
T1
public BinaryNode rotateRight(BinaryNode
nodeN2) BinaryNode nodeN1
(BinaryNode)nodeN2.getLeftChild()
nodeN2.setLeftChild(nodeN1.getRightChild())
nodeN1.setRightChild(nodeN2) return nodeN1
// end rotateRight
Case 4 left is very similar to this case.
22
Case 2 rotateLeftRight()(double rotation)
public BinaryNode rotateLeftRight(BinaryNode
nodeN) BinaryNode nodeC (BinaryNode)nodeN.ge
tLeftChild() nodeN.setLeftChild(rotateLeft(node
C)) return rotateRight(nodeN) // end
rotateLeftRight
23
Inserting a node in an AVL Tree(insertKeyNode())
private BinaryNode rebalance(BinaryNode nodeN)
int heightDifference getHeightDifference(nod
eN) if (heightDifference gt 1) //
left subtree is taller by more than 1,so addition
was in left subtree if
(getHeightDifference((BinaryNode)nodeN.getLeftChil
d()) gt 0) // addition was in left subtree of
left child nodeN TreeRotations.rotateRight(node
N) else // addition was in right
subtree of left child nodeN
TreeRotations.rotateLeftRight(nodeN)
else if (heightDifference lt -1)
.similar to above.you do it // else
nodeN is balanced return nodeN // end
rebalance
24
Removal from AVL Trees
  • Removal causes the same kind of imbalance
    problems
  • Solution is basically same as for insertion
  • Add a field called decrease to note height change
    or use the height difference field in the node to
    compute it.
  • Adjust the local nodes balance
  • Rebalance as necessary
  • The balance changed and balancing methods must
    set decrease appropriately or update the height
    field
  • Actual removal is as for binary search tree
  • Involves moving values, and
  • Deleting a suitable leaf node

25
Performance of AVL Trees
  • Worst case height 1.44 log n
  • Thus, lookup, insert, remove all O(log n)
  • Empirical cost is 0.25log n comparisons to insert

26
B-Trees
  • Not really binary trees at all.
  • Almost all file systems on almost all computers
    use B-Trees to keep track of which portions of
    which files are in which disk sectors.
  • The selection of choice for very large disk
    resident databases.
  • Why???
  • Very important in computer science.
  • Disk directories
  • Disk resident databases
  • ...
  • B-Trees are an example of multiway trees.
  • In multiway trees, nodes can have multiple data
    elements (in contrast to one for a binary tree
    node).
  • Each node in a B-Tree can represent possibly many
    subtrees.

27
m-Way Trees
  • An m-way tree is a search tree in which each node
    can have from zero to m subtrees.
  • m is defined as the order of the tree.
  • In a nonempty m-way tree
  • Each node has 0 to m subtrees.
  • Given a node with kltm subtrees, the node contains
    k subtrees (some of which may be null) and k-1
    data entries.
  • The keys are ordered, key1ltkey2ltkey3lt.ltkeyk-1
    .
  • The key values in the first subtree are less than
    the key values in the first entry etc.
  • m-way trees can still be unbalanced (but wait)
  • See 2-3 and 2-3-4 trees in book.

28
An m-way tree
K1
K2
K3
Keys
Subtrees
Keys lt K1
K1 ltKeys lt K2
K2 ltKeys lt K3
Keys gt K3
  • A 4-way Tree

A binary search tree is an m-way tree of order 2
or a 2-way tree.
29
B-Trees
  • A B-Tree is an m-way tree with the following
    additional properties
  • The root is either a leaf or it has 2.m
    subtrees.
  • All internal nodes have at least m/2 non-null
    subtrees and at most m nonnull subtrees.
  • All leaf nodes are at the same level that is,
    the tree is perfectly balanced.
  • A leaf node has at least m/2 -1 and at the
    most m-1 key entries.
  • There are four basic operations for B-Trees
  • insert (add)
  • delete (remove)
  • traverse
  • search

30
A B-tree of Order 5 (m5)
Root
42
Node with minimum entries (2)
Four keys, five subtrees
Node with maximum entries (4)
  • Min of subtrees is 3 and max is 5
  • Min of entries is 2 and max is 4

31
Insertion
  • B-tree insertion takes place at a leaf node.
  • Step 1 locate the leaf node for the data being
    inserted.
  • if node is not full (max no. of entries) then
    insert data in sequence in the node.
  • When leaf node is full, we have an overflow
    condition.
  • Insert the element anyway (temporarily violate
    tree conditions)
  • Split node into two nodes
  • Each new node contains half the data
  • middle entry is promoted to the parent (which may
    in turn become full!)
  • B-trees grow in a balanced fashion bottom up!

32
Follow Through An Example
  • Given a B-Tree structure of order m5.
  • Insert 11, 21, 14, 78, and 97.
  • Because order 5, a single node can contain a
    maximum of 4 (m -1) entries.
  • Step 1.
  • 11 causes the creation of a new node that becomes
  • the root of the tree.
  • As 21, 14, and 78 are inserted, they are just
    added (in order) to the root node (which is the
    only node in the tree at this point.
  • Inserting 97 causes a problem, because the node
    where it should go (the root) is full.

root
33
Inserting 97
  • When root node is full (that is, the node where
    the current value should go)
  • CHEAT! Insert 97 in the node anyway.
  • Now, because the node is larger than allowed,
    split it into two nodes
  • Propagate median value (21) to root node and
    insert it there (causes creation of a new root
    node in this case).

root
11
14
21
78
97
Violation!
34
Creation of a new Root Node
  • Tree grows from bottom up.
  • Tree is always balanced.
  • Depending upon m (typically 100-1000), tree is
    very shallow -gt search is efficient.

35
Continuing the Example
  • Suppose I now add the following keys to the tree
    85, 74, 63, 42, 45, 57.
  • Inserting 85 then 74

21
11
14
78
85
97
74
1
2
  • Now insert 63what happens

36
Example, contd.
  • 63 causes the node to overflow - but add it
    anyway!

This node violates the B-tree conditions so it
must be split.
21
11
14
78
85
97
74
63
3
split it up
63
78
85
97
74
37
Example Splitting a node
4
2
3
1
85
97
74
63
1. Median value is to be sent to parent node
- 78 here 2,3 Create a temporary root node with
one entry (78) and attach links to
right and left subtrees 4. Insert this node
into the nodelist of the parent
38
Example Tree after inserting 63
21
78
11
14
85
97
74
63
  • Now insert 45 and 42
  • Then insert 57

39
Example After adding 42, 45, and 57
74
45
63
42
  • Now add 20, 16, and 19

40
Tree after inserting 20, 16, and 19
  • Now insert 52, 30
  • Then 22

VIOLATION SPLIT
41
The Final Tree
  • Yggdrasil, the World Tree

42
The Final Tree
  • B-Tree node deletion is equally as interesting.
  • All deletes take place at a leaf node (when not
    at a leaf, substitute data must be found).
  • Underflow can occur when the number of elements
    in a root falls below the allowed minimum.
  • May have to borrow data from adjacent nodes
    and/or the parent.

43
A Typical B-Tree Node
  • Suppose we want to represent a node in an order m
    B-Tree.
  • m data elements, m1 subtrees
  • Suppose the class defining the tree node is
    IntBalancedSet
  • Or we could use a Linked List for each node and
    alternate keys and trees.
  • Or.

int data new int m1 //1 for the
cheat int dataCount // of data elements in
node IntBalancedSet subset new
IntBalancedSetm2 int childCount
44
The Structure
data
dataCount
2

6
15
?
?
?
?
0
1
2
3
m
m-1
for datai subseti - left subtree
subseti1 - right subtree
subset

null
null
null
0
1
2
3
m
m-1
Smaller subsets
6lt data elements lt 15 (or gt6 if duplicates
allowed)
data elements gt 15 (or gt15 if duplicates allowed)
data elements lt 6
45
Some Numbers
  • 105 words in a dictionary
  • 106 words in Moby Dick
  • 109 Social Security Numbers
  • 1012 Phone numbers in the world
  • 1015 people who ever lived
  • 1020 grains of sand in the world
  • 1025 manufactured bits of computer memory
  • 1079 electrons in the universe
  • With 1000 way branching, we could
  • Find every single bit of memory ever manufactured
    with less than 10 probes
  • Find any single electron in the universe with
    less than 27 probes

46
Why B-Trees are Important
  • Form the basis for almost every file indexing
    system Unix, Windows, Mac OS.
  • For a file index, cannot assume that the entire
    index will fit into memory (in fact, it cant by
    definition)
  • Therefore, the file index resides on the disk.
  • Big-O analysis assumes that all operations are
    equal - not true when disk I/O is involved
  • CPUs 400 million operations per second
  • Disks take on the order of 2-10 milliseconds to
    access a block of data
  • So we can do about 500 disk accesses per second.
  • At the same time, we can do about 400 million CPU
    operations
  • BOTTOM LINE disk accesses are VERY expensive
    (STILL!!!)

47
A Practical Example
  • Suppose we want to computerize drivers license
    information for the state of Massachusetts.
  • Assume we have a key of 32 bytes (a name), a 1024
    byte record of data, and about 20 million
    records.
  • Assume this does not fit into memory and that we
    have about 1/20 of the resources of the system
    (other people use it as well).
  • Thus, in one second we can perform 20 million
    operations or perform 25 disk accesses.
  • Analyze the performance of various tree
    representations.

48
A Practical Example
  • Unbalanced binary search tree DISASTER
  • Successful search 1.38 logN disk accesses
    (average)
  • 36 disk
    accesses (or about 1-2 secs)
  • Some accesses would take much longer.
  • This is just to do the lookups to find our data
    record!
  • Red-Black Tree (havent discussed)
  • also logN, although constant is a little better
    (1 secs)
  • Cant do better than logN with binary trees.
  • Need to reduce the number of disk accesses to a
    small constant, like 3 or 4.
  • Answer is intuitive - if we have more branching,
    we have less height in the tree and hence less
    accesses.
  • Complete binary tree has height that is roughly
    log2N
  • Complete m-way tree has height that is roughly
    logmN

49
Reminder
  • M-way trees are good for applications where the
    differences in access speeds are significant.
  • E.g. memory versus disk.

Core memory, circa 1960
5MB Disk, circa 1970
50
A Bit of History
(right)
Write a Comment
User Comments (0)
About PowerShow.com