Title: CSE 326: Data Structures Part Four: Trees
1CSE 326 Data StructuresPart Four Trees
2Material
- Weiss Chapter 4
- N-ary trees
- Binary Search Trees
- AVL Trees
- Splay Trees
3Other Applications of Trees?
4Tree Jargon
- Length of a path number of edges
- Depth of a node N length of path from root to N
- Height of node N length of longest path from N
to a leaf - Depth and height of tree height of root
depth0, height 2
A
C
D
B
F
E
depth 2, height0
5Definition and Tree Trivia
- Recursive Definition of a Tree
- A tree is a set of nodes that is
- a. an empty set of nodes, or
- b. has one node called the root from which
zero or more trees (subtrees) descend. - A tree with N nodes always has ___ edges
- Two nodes in a tree have at most how many paths
between them?
6Implementation of Trees
- Obvious Pointer-Based Implementation Node with
value and pointers to children - Problem?
A
C
D
B
F
E
71st Child/Next Sibling Representation
- Each node has 2 pointers one to its first child
and one to next sibling
A
A
C
D
B
C
D
B
F
E
F
E
8Nested List Implementation 1
a
b
d
c
9Nested List Implementation 2
a
b
d
c
10Application Arithmetic Expression Trees
Example Arithmetic Expression A (B
(C / D) ) Tree for the above expression
A
- Used in most compilers
- No parenthesis need use tree structure
- Can speed up calculations e.g. replace
- / node with C/D if C and D are known
- Calculate by traversing tree (how?)
B
/
D
C
11Traversing Trees
- Preorder Root, then Children
- A B / C D
- Postorder Children, then Root
- A B C D /
- Inorder Left child, Root, Right child
- A B C / D
A
B
/
D
C
12Example Code for Recursive Preorder
void print_preorder ( TreeNode T) TreeNode
P if ( T NULL ) return
else print_element(T.Element)
P T.FirstChild while (P !
NULL) print_preorder ( P
) P P.NextSibling
What is the running time for a tree with N nodes?
13Binary Trees
- Properties
- Notation depth(tree) MAX depth(leaf)
height(root) - max of leaves 2depth(tree)
- max of nodes 2depth(tree)1 1
- max depth n-1
- average depth for n nodes
- (over all possible binary trees)
- Representation
A
B
C
D
E
F
H
G
J
I
14Dictionary Search ADTs
- Operations
- create
- destroy
- insert
- find
- delete
- Dictionary Stores values associated with
user-specified keys - keys may be any (homogenous) comparable type
- values may be any (homogenous) type
- implementation data field is a struct with two
parts - Search ADT keys values
- kim chi
- spicy cabbage
- kreplach
- tasty stuffed dough
- kiwi
- Australian fruit
insert
find(kreplach)
- kreplach
- - tasty stuffed dough
15Naïve Implementations
unsorted array sorted array linked list
insert (w/o duplicates)
find
delete
- Goal fast find like sorted array, dynamic
inserts/deletes like linked list
16Naïve Implementations
unsorted array sorted array linked list
insert (w/o duplicates) find O(1) O(n) find O(1)
find O(n) O(log n) O(n)
delete find O(1) O(n) find O(1)
- Goal fast find like sorted array, dynamic
inserts/deletes like linked list
17Binary Search Tree Dictionary Data Structure
- Search tree property
- all keys in left subtree smaller than roots key
- all keys in right subtree larger than roots key
- result
- easy to find any given key
- inserts/deletes by changing links
8
11
5
12
10
6
2
4
14
7
9
13
18In Order Listing
- visit left subtree
- visit node
- visit right subtree
10
15
5
20
9
2
17
30
7
In order listing
19In Order Listing
- visit left subtree
- visit node
- visit right subtree
10
15
5
20
9
2
17
30
7
In order listing 2?5?7?9?10?15?17?20?30
20Finding a Node
- Node find(Comparable x, Node root)
-
- if (root NULL)
- return root
- else if (x lt root.key)
- return find(x,root.left)
- else if (x gt root.key)
- return find(x, root.right)
- else
- return root
10
15
5
20
9
2
30
7
17
runtime
21Insert
- Concept proceed down tree as in Find if new key
not found, then insert a new node at last spot
traversed - void insert(Comparable x, Node root)
- // Does not work for empty tree when root is
NULL - if (x lt root.key)
- if (root.left NULL)
- root.left new Node(x)
- else insert( x, root.left )
- else if (x gt root.key)
- if (root.right NULL)
- root.right new Node(x)
- else insert( x, root.right )
-
22Time to Build a Tree
- Suppose a1, a2, , an are inserted into an
initially empty BST - a1, a2, , an are in increasing order
- a1, a2, , an are in decreasing order
- a1 is the median of all, a2 is the median of
elements less than a1, a3 is the median of
elements greater than a1, etc. - data is randomly ordered
23Analysis of BuildTree
- Increasing / Decreasing ?(n2)
- 1 2 3 n ?(n2)
- Medians yields perfectly balanced tree
- ?(n log n)
- Average case assuming all input sequences are
equally likely is ?(n log n) - equivalently average depth of a node is log
nTotal time sum of depths of nodes
24Proof that Average Depth of a Node in a BST
Constructed from Random Data is ?(log n)
- Method Calculate sum of all depths, divide by
number of nodes - D(n) sum of depths of all nodes in a random BST
containing n nodes - D(n) D(left subtree)D(right subtree)
adjustment for distance from root to subtree
depth of root - D(n) D(left subtree)D(right subtree)
(number of nodes in left and right subtrees) 0 - D(n) D(L)D(n-L-1)(n-1)
25Random BST, cont.
- D(n) D(L)D(n-L-1)(n-1)
- For random data, all subtree sizes equally likely
this looks just like the Quicksort average case
equation!
26(No Transcript)
27Random Input vs. Random Trees
Trees
- Inputs
- 1,2,3
- 3,2,1
- 1,3,2
- 3,1,2
- 2,1,3
- 2,3,1
For three items, the shallowest tree is twice as
likely as any other effect grows as n
increases. For n4, probability of getting a
shallow tree gt 50
28Deletion
10
15
5
20
9
2
17
30
7
Why might deletion be harder than insertion?
29FindMin/FindMax
10
15
5
20
9
2
- Node min(Node root)
- if (root.left NULL)
- return root
- else
- return min(root.left)
30
7
17
How many children can the min of a node have?
30Successor
- Find the next larger node
- in this nodes subtree.
- not next larger in entire tree
- Node succ(Node root)
- if (root.right NULL)
- return NULL
- else
- return min(root.right)
10
15
5
20
9
2
17
30
7
How many children can the successor of a node
have?
31Deletion - Leaf Case
10
Delete(17)
15
5
20
9
2
17
30
7
32Deletion - One Child Case
10
Delete(15)
15
5
20
9
2
30
7
33Deletion - Two Child Case
10
Delete(5)
20
5
30
9
2
7
replace node with value guaranteed to be between
the left and right subtrees the successor
Could we have used the predecessor instead?
34Deletion - Two Child Case
10
Delete(5)
20
5
30
9
2
7
always easy to delete the successor always has
either 0 or 1 children!
35Deletion - Two Child Case
10
Delete(5)
20
7
30
9
2
7
Finally copy data value from deleted successor
into original node
36Lazy Deletion
- Instead of physically deleting nodes, just mark
them as deleted - simpler
- physical deletions done in batches
- some adds just flip deleted flag
- extra memory for deleted flag
- many lazy deletions slow finds
- some operations may have to be modified (e.g.,
min and max)
10
15
5
20
9
2
17
30
7
37Dictionary Implementations
unsorted array sorted array linked list BST
insert find O(1) O(n) find O(1) O(Depth)
find O(n) O(log n) O(n) O(Depth)
delete find O(1) O(n) find O(1) O(Depth)
- BSTs looking good for shallow trees, i.e. the
depth D is small (log n), otherwise as bad as a
linked list!
38CSE 326 Data StructuresPart 3 Trees,
continuedBalancing Act
- Henry Kautz
- Autumn Quarter 2002
39Beauty is Only ?(log n) Deep
- Binary Search Trees are fast if theyre shallow
- e.g. complete
- Problems occur when one branch is much longer
than the other - How to capture the notion of a sort of complete
tree?
40Balance
t
6
5
- balance height(left subtree) - height(right
subtree) - convention height of a null subtree is -1
- zero everywhere ?? perfectly balanced
- small everywhere ? balanced enough ?(log n)
- Precisely Maximum depth is 1.44 log n
41AVL Tree Dictionary Data Structure
8
- Binary search tree properties
- Balance of every node is -1?? b ? 1
- Tree re-balances itself after every insert or
delete
11
5
12
10
6
2
4
14
13
7
9
15
What is the balance of each node in this tree?
42AVL Tree Data Structure
10
data
3
3
height
10
children
1
2
5
15
1
0
0
0
12
20
9
2
0
0
17
30
43Not An AVL Tree
10
data
4
4
height
10
children
1
3
5
15
2
0
0
0
12
20
9
2
1
0
17
30
0
18
44Bad Case 1
- Insert(small)
- Insert(middle)
- Insert(tall)
2
S
1
M
0
T
45Single Rotation
2
1
S
M
1
M
0
0
S
T
0
T
Basic operation used in AVL trees A right child
could legally have its parent as its left child.
46General Case Insert Unbalances
h 1
h 2
a
a
h - 1
h 1
h - 1
h
X
b
X
b
h
h-1
h - 1
h - 1
Y
Z
Y
Z
h 1
b
h
a
h
Z
h - 1
h - 1
X
Y
47Properties of General Insert Single Rotation
- Restores balance to a lowest point in tree where
imbalance occurs - After rotation, height of the subtree (in the
example, h1) is the same as it was before the
insert that imbalanced it - Thus, no further rotations are needed anywhere in
the tree!
48Bad Case 2
- Insert(small)
- Insert(tall)
- Insert(middle)
2
S
1
T
Why wont a single rotation (bringing T up to the
top) fix this?
0
M
49Double Rotation
2
2
S
S
1
M
1
1
M
T
0
0
0
S
T
0
T
M
50General Double Rotation
h 3
a
h 2
h 2
c
h
b
Z
h1
h1
a
b
h
h1
W
h
h
h
c
Y
X
Z
W
h
Y
X
- Initially insert into X unbalances tree (root
height goes to h3) - Zig zag to pull up c restores root height to
h2, left subtree height to h
51Another Double Rotation Case
h 3
a
h 2
h 2
c
h
b
Z
h1
h1
a
b
h
h1
W
h
h
h
c
Y
X
Z
W
Y
h
X
- Initially insert into Y unbalances tree (root
height goes to h2) - Zig zag to pull up c restores root height to
h1, left subtree height to h
52Insert Algorithm
- Find spot for value
- Hang new node
- Search back up looking for imbalance
- If there is an imbalance
- outside Perform single rotation and exit
- inside Perform double rotation and exit
53AVL Insert Algorithm
- Node insert(Comparable x, Node root)
- // returns root of revised tree
- if ( root NULL )
- return new Node(x)
- if (x lt root.key)
- root.left insert( x, root.left )
- if (root unbalanced) rotate...
- else // x gt root.key
- root.right insert( x, root.right )
- if (root unbalanced) rotate...
- root.height max(root.left.height,
- root.right.height)1
- return root
54Deletion (Really Easy Case)
10
Delete(17)
15
5
12
20
9
2
30
17
3
55Deletion (Pretty Easy Case)
10
Delete(15)
15
5
12
20
9
2
30
17
3
56Deletion (Pretty Easy Case cont.)
3
10
Delete(15)
2
2
17
5
1
1
0
0
12
20
9
2
0
0
30
3
57Deletion (Hard Case 1)
Delete(12)
10
17
5
12
20
9
2
30
3
58Single Rotation on Deletion
3
10
10
2
1
17
5
20
5
1
0
0
0
20
9
2
30
9
2
17
0
30
3
3
What is different about deletion than insertion?
59Deletion (Hard Case)
Delete(9)
10
17
5
12
12
20
9
2
20
1
1
0
0
30
3
15
30
15
11
18
0
0
0
0
0
33
13
33
13
60Double Rotation on Deletion
Not finished!
10
10
17
5
17
3
12
12
2
20
2
20
5
2
1
1
0
0
1
1
0
0
0
3
30
15
11
18
30
15
11
18
0
0
0
0
0
33
13
33
13
61Deletion with Propagation
10
What different about this case?
17
3
12
20
5
2
1
1
0
0
We get to choose whether to single or double
rotate!
30
15
11
18
0
0
33
13
62Propagated Single Rotation
4
10
17
3
2
10
17
3
20
1
2
1
0
18
12
20
5
2
12
30
3
0
1
1
0
0
1
0
0
0
30
33
15
5
2
11
15
11
18
0
0
0
33
13
13
63Propagated Double Rotation
4
10
12
2
3
10
17
3
17
1
0
1
2
15
20
11
20
5
2
12
3
1
0
0
1
1
0
0
0
0
30
30
18
5
2
15
11
18
13
0
0
0
33
33
13
64AVL Deletion Algorithm
- Recursive
- If at node, delete it
- Otherwise recurse to find it in
- 3. Correct heights
- a. If imbalance 1,
- single rotate
- b. If imbalance 2 (or dont care),
- double rotate
- Iterative
- 1. Search downward for
- node, stacking
- parent nodes
- 2. Delete node
- 3. Unwind stack,
- correcting heights
- a. If imbalance 1,
- single rotate
- b. If imbalance 2 (or dont care)
- double rotate
65AVL
- Automatically Virtually Leveled
- Architecture for inVisible Leveling
- Articulating Various Lines
- Amortizing? Very Lousy!
- Amazingly Vexing Letters
66AVL
- Automatically Virtually Leveled
- Architecture for inVisible Leveling
- Articulating Various Lines
- Amortizing? Very Lousy!
- Amazingly Vexing Letters
Adelson-Velskii Landis
67Pros and Cons of AVL Trees
- Pro
- All operations guaranteed O(log N)
- The height balancing adds no more than a constant
factor to the speed of insertion - Con
- Space consumed by height field in each node
- Slower than ordinary BST on random data
- Can we guarantee O(log N) performance with less
overhead?
68Splay Trees
- CSE 326 Data StructuresPart 3 Trees, continued
69Today Splay Trees
- Fast both in worst-case amortized analysis and in
practice - Are used in the kernel of NT for keep track of
process information! - Invented by Sleator and Tarjan (1985)
- Details
- Weiss 4.5 (basic splay trees)
- 11.5 (amortized analysis)
- 12.1 (better top down implementation)
70Basic Idea
- Blind rebalancing no height info kept!
- Worst-case time per operation is O(n)
- Worst-case amortized time is O(log n)
- Insert/find always rotates node to the root!
- Good locality
- Most commonly accessed keys move high in tree
become easier and easier to find
71Idea
move n to root by series of zig-zag and zig-zig
rotations, followed by a final single rotation
(zig) if necessary
10
Youre forced to make a really deep access
17
5
9
2
3
72Zig-Zag
Helped Unchanged Hurt
g
n
up 2
X
p
g
p
down 1
up 1
down 1
n
W
Y
W
Z
X
Y
Z
This is just a double rotation
73Zig-Zig
n
g
Z
p
W
p
Y
g
X
n
X
W
Y
Z
74Why Splaying Helps
- Node n and its children are always helped
(raised) - Except for last step, nodes that are hurt by a
zig-zag or zig-zig are later helped by a rotation
higher up the tree! - Result
- shallow nodes may increase depth by one or two
- helped nodes decrease depth by a large amount
- If a node n on the access path is at depth d
before the splay, its at about depth d/2 after
the splay - Exceptions are the root, the child of the root,
and the node splayed
75Splaying Example
1
1
2
2
zig-zig
3
3
Find(6)
4
5
6
76Still Splaying 6
1
1
2
6
zig-zig
3
3
2
5
4
77Almost There, Stay on Target
1
6
zig
3
3
2
5
2
5
4
4
78Splay Again
zig-zag
3
4
Find(4)
2
5
3
5
4
2
79Example Splayed Out
4
6
1
zig-zag
3
5
4
2
3
5
2
80Locality
- Locality if an item is accessed, it is likely
to be accessed again soon - Why?
- Assume m ? n access in a tree of size n
- Total worst case time is O(m log n)
- O(log n) per access amortized time
- Suppose only k distinct items are accessed in the
m accesses. - Time is O(n log n m log k )
- Compare with O( m log n ) for AVL tree
those k items are all at the top of the tree
getting those k items near root
81Splay Operations Insert
- To insert, could do an ordinary BST insert
- but would not fix up tree
- A BST insert followed by a find (splay)?
- Better idea do the splay before the insert!
- How?
82Split
- Split(T, x) creates two BSTs L and R
- All elements of T are in either L or R
- All elements in L are ? x
- All elements in R are ? x
- L and R share no elements
- Then how do we do the insert?
83Split
- Split(T, x) creates two BSTs L and R
- All elements of T are in either L or R
- All elements in L are ? x
- All elements in R are gt x
- L and R share no elements
- Then how do we do the insert?
- Insert as root, with children L and R
84Splitting in Splay Trees
- How can we split?
- We have the splay operation
- We can find x or the parent of where x would be
if we were to insert it as an ordinary BST - We can splay x or the parent to the root
- Then break one of the links from the root to a
child
85Split
could be x, or what would have been the parent of
x
split(x)
splay
T
L
R
if root is gt x
OR
L
R
L
R
? x
gt x
gt x
lt x
86Back to Insert
x
L
R
? x
gt x
- Insert(x)
- Split on x
- Join subtrees using x as root
87Insert Example
Insert(5)
6
4
4
6
9
1
split(5)
6
1
1
9
9
4
7
2
2
7
7
2
5
4
6
1
9
2
7
88Splay Operations Delete
x
delete x
L
R
gt x
lt x
Now what?
89Join
- Join(L, R) given two trees such that L lt R,
merge them - Splay on the maximum element in L then attach R
L
90Delete Completed
x
T
delete x
L
R
gt x
lt x
Join(L,R)
T - x
91Delete Example
Delete(4)
6
4
6
9
1
find(4)
6
1
1
9
9
4
7
2
2
7
Find max
7
2
2
2
1
6
1
6
9
9
7
7
92Splay Trees, Summary
- Splay trees are arguably the most practical kind
of self-balancing trees - If number of finds is much larger than n, then
locality is crucial! - Example word-counting
- Also supports efficient Split and Join operations
useful for other tasks - E.g., range queries