Title: Introduction to Programming with Data Structures
1Introduction to Programming with Data Structures
Computer Science 187
Lecture 21 Balanced Trees, AVL Trees, and m-way
Trees Announcements
2Binary Search Trees
- Whats the structure of the binary search tree I
get in each case when I construct a tree from the
keys - 1 , 2 , 3 , 4 , 5 , 6?
- 6 , 5 , 4 , 3 , 2 , 1?
- 3 , 2 , 5 , 1 , 4 , 6?
3Balanced Trees
- Consider adding 7 chars a, b, c, d, e, f, and g
to an initially empty binary search tree two
ways. - in order a b c d e f g
- in random order d f b a e c g
BALANCED
UNBALANCED
The order in which data values arrive can make a
HUGE difference in what the tree looks like.
Search O(n)
Search O(log n)
4Definition of Balanced
- balance a tree node attribute representing the
difference in height between the nodes subtrees. - The balance attribute for a balanced binary
tree is - -1, 0, or 1.
- The definition is recursive and holds for all
root nodes and their left and right subtrees. - Achieving balance is important for minimizing the
time required for search. - We will look at AVL trees but not in a lot of
detail.
5Schemes for Balancing Trees
- AVL trees Adelson-Velskii and Landis1962
- named for initials of Russian creators
- uses rotations to ensure heights of child trees
differ by at most 1 - 23-Trees Hopcroft 1970
- similar to 234-tree, but repairs have to move
back up the tree - B-Trees Bayer McCreight 1972
- Red-Black Trees Bayer1972
- not the original name
- Red-black convention relation to 234-trees
Guibas Stolfi 1978 - Splay Trees Sleator Tarjan 1983
- Skip Lists Pugh 1990
- developed at Cornell
6AVL Trees
- AVL trees are binary search trees with a balance
condition. - First simple idea left and right subtrees of
root must have the same height... but not good
enough
Balanced but not shallow
- In an AVL tree, the height of the left and right
subtree of every node can differ by at most 1.
7AVL Tree Example
10
0
2
5
16
0
1
2
8
0
1
AVL Tree
Not an AVL Tree
- Search, insert, etc. Complexity O(h), where h
is height of the tree. - Height of an AVL tree O(log n), where n is the
number of nodes.
8Insertion in an AVL Tree
Insert 7
10
10
1
2
1
2
5
16
5
16
1
0
0
1
1
2
8
12
2
8
12
0
0
0
7
1
1
- Height of tree rooted at 5 does not change.
- Tree is still balanced
9Insertion Example 2
Insert 0
10
1
3
5
16
2
0
2
2
8
12
1
1
0
0
- Inserting a node causes heights in tree to change
- Tree is no longer balanced
10When and Where do Heights Change?
10
1
Assume this is the node where an imbalance occurs
5
16
Therefore, the height of these nodes differ by
more than 1.
2
8
12
Insertions here could cause the
imbalance. (anywhere else?)
- Therefore there are four cases to consider
- Insertion into the left subtree of the left child
of 5 Case 1 - Insertion into the right subtree of the left
child of 5 Case 2 - Insertion into the left subtree of the right
child of 5 Case 3 - Insertion into the right subtree of the right
child of 5 Case 4 - First and fourth are symmetric, second and third
are symmetric
11Case 1(right rotation) Single Rotation
root
root
1. Make N1 the root node 2. Make T2 the left
child of n2
Tree balanced
Tree Unbalanced
- Binary search tree property holds in both trees
- Balance is achieved in the 'rotated' tree
12Rotation to the Right
- Algorithm for rotation (toward the right)
- Save value of root.left (temp root.left)
- Set root.left to value of root.left.right
- Set temp.right to root
- Set root to temp
- Algorithm for rotation toward the left is similar
- you do it.
root
temp
13Case 4 (left rotation) Single Rotation
n1
n2
1. Make n2 the root node 2. Make T2 the right
child of n1
n2
n1
T1
T3
T2
T1
Tree Unbalanced
Tree balanced
T3
- Binary search tree property holds in both trees
- Balance is achieved in the 'rotated' tree
14Case 2 (left-right) Trouble
- Single Rotation fails to fix this case
- Need a more complex manipulation of the tree
15Case 2 (left-right) Double Rotation(two single
rotations)
Solution 1. Rotate n3 into n1 2. Rotate n3 into
n2
Tree Unbalanced
- Expand T2 in original tree into a node and two
subtrees - We know that neither n1 nor n2 works as the root
- Solution is then two single rotations to get n3
at root
16Case 2 (left-right) Double Rotation(two single
rotations)
n2
n2
n3
n3
n1
n1
n3
n1
n2
Rotate n3 into n1
Rotate n3 into n2
- Binary search tree property holds in both trees
- Balance is achieved in the 'rotated' tree
17Case 3 (right-left) Double Rotation(two single
rotations)
n2
1. Rotate n3 into n1 2. Rotate n3 into n2
n3
n1
n1
n2
n3
- Binary search tree property holds in both trees
- Balance is achieved in the 'rotated' tree
18Unbalanced Search Trees
- Left-Left (parent balance is -2, left child
balance is -1) - Rotate right around parent
- Left-Right (parent balance -2, left child balance
1) - Rotate left around child
- Rotate right around parent
- Right-Left (parent balance 2, right child
balance -1) - Rotate right around child
- Rotate left around parent
- Right-Right (parent balance 2, right child
balance 1) - Rotate left around parent
19Book Solution
Add 1/0/-1 balance indicator
Add boolean flag to indicate height increase
20A new AVL insertKey Method
- Assume our BinaryNode has a fourth attribute
called height - You make the modification to BinaryNode to
accomplish this - We'll write a new insertKeyNode method to handle
balancing the tree after insertion - Insert pretty much as before
- After insertion, call balancing routines, each of
which implements one of our four cases - rotateLeft rotateRight (cases 1 and 4)
- rotateLeftRight rotateRightLeft (cases 2 and 3)
- Our old method insertKey doesn't change
- More or less a skeleton - some details will be
missing
21Case 1(right) Single RotationrotateRight()
Detach and save T2 Make n1 reference n2 as right
child Attach T2 as left child of n2 Update
heights - here or elsewhere return new root node
n2
n2 is the root of the tree n1 is the left child
of the root
n1
T3
T2
T1
public BinaryNode rotateRight(BinaryNode
nodeN2) BinaryNode nodeN1
(BinaryNode)nodeN2.getLeftChild()
nodeN2.setLeftChild(nodeN1.getRightChild())
nodeN1.setRightChild(nodeN2) return nodeN1
// end rotateRight
Case 4 left is very similar to this case.
22Case 2 rotateLeftRight()(double rotation)
public BinaryNode rotateLeftRight(BinaryNode
nodeN) BinaryNode nodeC (BinaryNode)nodeN.ge
tLeftChild() nodeN.setLeftChild(rotateLeft(node
C)) return rotateRight(nodeN) // end
rotateLeftRight
23Inserting a node in an AVL Tree(insertKeyNode())
private BinaryNode rebalance(BinaryNode nodeN)
int heightDifference getHeightDifference(nod
eN) if (heightDifference gt 1) //
left subtree is taller by more than 1,so addition
was in left subtree if
(getHeightDifference((BinaryNode)nodeN.getLeftChil
d()) gt 0) // addition was in left subtree of
left child nodeN TreeRotations.rotateRight(node
N) else // addition was in right
subtree of left child nodeN
TreeRotations.rotateLeftRight(nodeN)
else if (heightDifference lt -1)
.similar to above.you do it // else
nodeN is balanced return nodeN // end
rebalance
24Removal from AVL Trees
- Removal causes the same kind of imbalance
problems - Solution is basically same as for insertion
- Add a field called decrease to note height change
or use the height difference field in the node to
compute it. - Adjust the local nodes balance
- Rebalance as necessary
- The balance changed and balancing methods must
set decrease appropriately or update the height
field - Actual removal is as for binary search tree
- Involves moving values, and
- Deleting a suitable leaf node
25Performance of AVL Trees
- Worst case height 1.44 log n
- Thus, lookup, insert, remove all O(log n)
- Empirical cost is 0.25log n comparisons to insert
26B-Trees
- Not really binary trees at all.
- Almost all file systems on almost all computers
use B-Trees to keep track of which portions of
which files are in which disk sectors. - The selection of choice for very large disk
resident databases. - Why???
- Very important in computer science.
- Disk directories
- Disk resident databases
- ...
- B-Trees are an example of multiway trees.
- In multiway trees, nodes can have multiple data
elements (in contrast to one for a binary tree
node). - Each node in a B-Tree can represent possibly many
subtrees.
27m-Way Trees
- An m-way tree is a search tree in which each node
can have from zero to m subtrees. - m is defined as the order of the tree.
- In a nonempty m-way tree
- Each node has 0 to m subtrees.
- Given a node with kltm subtrees, the node contains
k subtrees (some of which may be null) and k-1
data entries. - The keys are ordered, key1ltkey2ltkey3lt.ltkeyk-1
. - The key values in the first subtree are less than
the key values in the first entry etc. - m-way trees can still be unbalanced (but wait)
- See 2-3 and 2-3-4 trees in book.
28An m-way tree
K1
K2
K3
Keys
Subtrees
Keys lt K1
K1 ltKeys lt K2
K2 ltKeys lt K3
Keys gt K3
A binary search tree is an m-way tree of order 2
or a 2-way tree.
29B-Trees
- A B-Tree is an m-way tree with the following
additional properties - The root is either a leaf or it has 2.m
subtrees. - All internal nodes have at least m/2 non-null
subtrees and at most m nonnull subtrees. - All leaf nodes are at the same level that is,
the tree is perfectly balanced. - A leaf node has at least m/2 -1 and at the
most m-1 key entries. - There are four basic operations for B-Trees
- insert (add)
- delete (remove)
- traverse
- search
30A B-tree of Order 5 (m5)
Root
42
Node with minimum entries (2)
Four keys, five subtrees
Node with maximum entries (4)
- Min of subtrees is 3 and max is 5
- Min of entries is 2 and max is 4
31Insertion
- B-tree insertion takes place at a leaf node.
- Step 1 locate the leaf node for the data being
inserted. - if node is not full (max no. of entries) then
insert data in sequence in the node. - When leaf node is full, we have an overflow
condition. - Insert the element anyway (temporarily violate
tree conditions) - Split node into two nodes
- Each new node contains half the data
- middle entry is promoted to the parent (which may
in turn become full!) - B-trees grow in a balanced fashion bottom up!
32Follow Through An Example
- Given a B-Tree structure of order m5.
- Insert 11, 21, 14, 78, and 97.
- Because order 5, a single node can contain a
maximum of 4 (m -1) entries. - Step 1.
- 11 causes the creation of a new node that becomes
- the root of the tree.
- As 21, 14, and 78 are inserted, they are just
added (in order) to the root node (which is the
only node in the tree at this point. - Inserting 97 causes a problem, because the node
where it should go (the root) is full.
root
33Inserting 97
- When root node is full (that is, the node where
the current value should go) - CHEAT! Insert 97 in the node anyway.
- Now, because the node is larger than allowed,
split it into two nodes - Propagate median value (21) to root node and
insert it there (causes creation of a new root
node in this case).
root
11
14
21
78
97
Violation!
34Creation of a new Root Node
- Tree grows from bottom up.
- Tree is always balanced.
- Depending upon m (typically 100-1000), tree is
very shallow -gt search is efficient.
35Continuing the Example
- Suppose I now add the following keys to the tree
85, 74, 63, 42, 45, 57. - Inserting 85 then 74
21
11
14
78
85
97
74
1
2
- Now insert 63what happens
36Example, contd.
- 63 causes the node to overflow - but add it
anyway!
This node violates the B-tree conditions so it
must be split.
21
11
14
78
85
97
74
63
3
split it up
63
78
85
97
74
37Example Splitting a node
4
2
3
1
85
97
74
63
1. Median value is to be sent to parent node
- 78 here 2,3 Create a temporary root node with
one entry (78) and attach links to
right and left subtrees 4. Insert this node
into the nodelist of the parent
38Example Tree after inserting 63
21
78
11
14
85
97
74
63
- Now insert 45 and 42
- Then insert 57
39Example After adding 42, 45, and 57
74
45
63
42
40Tree after inserting 20, 16, and 19
VIOLATION SPLIT
41The Final Tree
- Yggdrasil, the World Tree
42The Final Tree
- B-Tree node deletion is equally as interesting.
- All deletes take place at a leaf node (when not
at a leaf, substitute data must be found). - Underflow can occur when the number of elements
in a root falls below the allowed minimum. - May have to borrow data from adjacent nodes
and/or the parent.
43A Typical B-Tree Node
- Suppose we want to represent a node in an order m
B-Tree. - m data elements, m1 subtrees
- Suppose the class defining the tree node is
IntBalancedSet - Or we could use a Linked List for each node and
alternate keys and trees. - Or.
int data new int m1 //1 for the
cheat int dataCount // of data elements in
node IntBalancedSet subset new
IntBalancedSetm2 int childCount
44The Structure
data
dataCount
2
6
15
?
?
?
?
0
1
2
3
m
m-1
for datai subseti - left subtree
subseti1 - right subtree
subset
null
null
null
0
1
2
3
m
m-1
Smaller subsets
6lt data elements lt 15 (or gt6 if duplicates
allowed)
data elements gt 15 (or gt15 if duplicates allowed)
data elements lt 6
45Some Numbers
- 105 words in a dictionary
- 106 words in Moby Dick
- 109 Social Security Numbers
- 1012 Phone numbers in the world
- 1015 people who ever lived
- 1020 grains of sand in the world
- 1025 manufactured bits of computer memory
- 1079 electrons in the universe
- With 1000 way branching, we could
- Find every single bit of memory ever manufactured
with less than 10 probes - Find any single electron in the universe with
less than 27 probes
46Why B-Trees are Important
- Form the basis for almost every file indexing
system Unix, Windows, Mac OS. - For a file index, cannot assume that the entire
index will fit into memory (in fact, it cant by
definition) - Therefore, the file index resides on the disk.
- Big-O analysis assumes that all operations are
equal - not true when disk I/O is involved - CPUs 400 million operations per second
- Disks take on the order of 2-10 milliseconds to
access a block of data - So we can do about 500 disk accesses per second.
- At the same time, we can do about 400 million CPU
operations - BOTTOM LINE disk accesses are VERY expensive
(STILL!!!)
47A Practical Example
- Suppose we want to computerize drivers license
information for the state of Massachusetts. - Assume we have a key of 32 bytes (a name), a 1024
byte record of data, and about 20 million
records. - Assume this does not fit into memory and that we
have about 1/20 of the resources of the system
(other people use it as well). - Thus, in one second we can perform 20 million
operations or perform 25 disk accesses. - Analyze the performance of various tree
representations.
48A Practical Example
- Unbalanced binary search tree DISASTER
- Successful search 1.38 logN disk accesses
(average) - 36 disk
accesses (or about 1-2 secs) - Some accesses would take much longer.
- This is just to do the lookups to find our data
record! - Red-Black Tree (havent discussed)
- also logN, although constant is a little better
(1 secs) - Cant do better than logN with binary trees.
- Need to reduce the number of disk accesses to a
small constant, like 3 or 4. - Answer is intuitive - if we have more branching,
we have less height in the tree and hence less
accesses. - Complete binary tree has height that is roughly
log2N - Complete m-way tree has height that is roughly
logmN
49Reminder
- M-way trees are good for applications where the
differences in access speeds are significant. - E.g. memory versus disk.
Core memory, circa 1960
5MB Disk, circa 1970
50A Bit of History
(right)