Title: Binary Trees our leafy, annoying friends
1Binary Treesour leafy, annoying friends
Annatala Wolf 222 Lecture 6
2Relations (Order Properties)
- A binary relation is a set of pairs. We can
classify relations based on properties that are
always true. Examples of relations - reflexivity a R a
- irreflexivity ?(a R a)?
- transitivity a R b ? b R c ? a R c
- symmetry a R b ? b R a
- antisymmetry a R b ? b R a ? a b
- totality a R b?? b R a
3Order Relations (Examples)
- Equivalence reflexive, symmetric, and transitive
- examples ceiling and floor functions?, true
equality - Preorder (simplest kind of order) reflexive and
transitive - examples human preferences, logical implication?
- Total preorder transitive and total (?
reflexive) - examples like preorder, but each pair is
comparable - Partial order reflexive, antisymmetric,
transitive - example strings by length, integers by rightmost
digit - Total order antisymmetric, transitive, and total
(?reflexive)? - examples on the integers, increasing lexicogr.
ASCII
4Are_In_Order
- Utility class used for comparing Item
- Mathematically modeled by total preorder
- preorder so that it can handle any kind of
ordering situation, even loose orderings, such
as ordering strings by length - total to ensure that Are_In_Order always has some
correct way to order two items (cant be
non-ordered in both directions)
5Note the Difference
- Because Are_In_Order is reflexive (this is
implied by transitive total), it catches
equality as true - Are_In_Order is not necessarily antisymmetric
- (a AIO b) and (b AIO a) does not imply a b
- Consider ordering Text by length foo ? bar
- So we might also need Is_Equal_To.
- You must take special care how AIO is used. It
acts somewhat similar to - Namely, !(x y) is not the same as y x
6Using Are_In_Order Correctly
- To ensure Are_In_Order is used properly, you must
always test in the same direction - Just like using , you want to test
- base case Are_In_Order(x, y)?
- opposite case not Are_In_Order(x, y)?
- bad idea! Are_In_Order(y, x)?
- Well see this in more detail when we look at
using binary search trees
7Seven Bridges of Königsberg
- Classic problem from the 1700s could you cross
each bridge in Königsberg exactly once, in a
single path? - This seems like a question math could answerbut
at the time, math had not yet been applied to
this sort of physical relationship.
Euler (same fellow who named the constant e)
proved in 1735 that it was impossible to do this.
8Graph Theory
- Euler proved this by inventing a new form of math
called graph theory, that studies connections. - Later, it developed into a related field,
topology.
Euler noticed all 4 vertices have an odd degree
( of edges)...
Each region is a vertex
But unless a vertex is at the start or end, it
must have an even degree (one edge in one edge
out). A contradiction!
each bridge is an edge.
9Graphs
- Graph a set of vertices (or nodes) connected by
edges - Each edge links two vertices, or one vertex to
itself a self-loop (self-edge) - In some graphs, the edges may be directed (point
one way) or havevalues, i.e., color, order,
weight - An empty graph has zeronodes and zero edges
- Path a sequence of vertices connected by their
edges
10Paths and Cycles
- A simple path is a path with no repeated
vertices. (In the undirected graph below, there
are infinitely many paths from 5 to 6, but only
three simple paths.) - A cycle is a path that starts and ends at the
same vertex. If that vertex is the only repeated
one, its a simple cycle. (There are six simple
cycles from 5.) - In directed graphs, you canonly travel in the
directionof the arrows. But edges canstill go
in both directions.
11Graphs in Computer Science
- Graphs are extremely useful in CS! We usually
use them as data structures to hold information,
or to solve problems. - Graphs can be used to do many things
- to model how data moves through networks
- to hold data and relationships together
- to find the quickest solution to a problem
involving many separate pieces - to describe how program components talk
12Trees
- Trees are a common, useful graph structure in CS.
Trees have several equivalent definitions. - A tree is any undirected graph where
- it is also a connected (every vertex can reach
every other vertex), acyclic graph (no cycles
means no loops of any size). - or, every two vertices are connected by exactly
one simple path. - or, it is either empty (zero nodes, zero edges)
or else it is a connected graph with n vertices
and n-1 edges (for some number n).
13Types of Trees
- A rooted tree is simply a tree with one vertex
specially marked as the root. - We call this vertex the root, and now think of
edges being directed to or from the root (either
way is fine). - An empty tree (no vertices or edges) can still be
rooted, technically. - An ordered tree is a tree where the edges for
each node have an order associated with them. - Often numbered 0, 1, 2 (Or, left, right for
binary.) - A forest is a graph that is a set of trees.
- Usually this is just an undirected, acyclic graph!
14Visualizing Rooted Trees
- When we draw rooted trees, we often use the
following common conventions - Draw the root of the tree at the top
- Draw vertices horizontally level based on how
many edges they are from the root - If the tree is ordered, draw the outgoing (child)
edges from left to right in that order - lt0, 1, 2gt (if ordered by number)
- ltleft, rightgt (if binary-ordered)
15Rooted Tree Terminology
Size total number of nodes. Height total
number of levels (max distance from the root).
Leaves are nodes without children. Nodes with
children are called internal nodes.
16Trees in Computer Science
- Trees are a useful model for recursive data
structures. - the directory system for your computer uses a
rooted tree - networks like Ethernet can be trees
- trees can be used to store order or relationship
information between items - When used as a data structure, each item is
usually a vertex in the tree.
17Binary Trees
- A binary tree is a rooted tree where every node
has at most two children. - Binary trees are also often ordered. This means
there is a distinct left child and right child. - If we consider the empty tree to be the child
for a node missing a left or right outgoing edge,
we can say that every node has exactly two
children (unless it is empty)?. - This definition is very useful to the recursive
structure of the tree!
18Recursive Structure
- Think of each binary tree as either being empty,
or else consisting of a root node, a left
subtree, and a right subtree.
(subtrees might still be empty)
ø
OR
empty tree (0 nodes, 0 edges)
non-empty tree
19What We Gain
- We no longer have special cases for nodes based
on whether they have left, right, both, or
neither child. All nodes now have exactly two
subtrees!
A
A
B
ø
B
D
C
D
C
ø
ø
E
ø
E
ø
ø
20Thinking Recursively
- By defining binary trees recursively, its much
easier to work with them. The only special case
we have to handle is the empty tree case, so the
interface can be simpler.
Our only special case. Often (but not always), we
can use this as our recursive base case.
ø
empty tree (0 nodes, 0 edges)
non-empty tree
21Binary_Tree
- Intuitively, Binary_Tree is a container with one
base item (the root), and the rest of the items
partitioned to the left or right of the root. - You can only access the root, or break the tree
into the root and its two subtrees. - Mathematically, Binary_Tree is an ordered binary
tree of Item. - default value the empty tree
22Binary_Tree Kernel Operations
- Compose (x, left_subtree, right_subtree)?
- requires true produces self
- Decompose (x, left_subtree, right_subtree)?
- requires self???? consumes self
- Accessor current
- requires self????
- Height ( )?
- requires true
- Size ( )?
- requires true
23Using Binary_Tree
- Anything you do using Binary_Tree will be
recursive. Period. - The only way to work with Binary_Tree is to
handle one height-level of the tree at a time. - If you ever decompose a subtree (decompose a
tree, and then do it to its subtree too), youre
doing it wrong! This should never happen. - You must rely upon recursion to do the work for
you at lower levels! Binary tree has a recursive
structure, and only recursion will correctly
exploit it.
24Preconditions for Binary_Tree
- The only preconditions for Binary_Tree are
- You cannot Decompose() the empty tree.
- You cannot use current on the empty tree.
- Fortunately, the empty tree is easy to recognize.
It is the only tree that has a size (and a
height) of zero. - Frequently, the empty tree will be the base case
for recursionbut not always! Some operations
have tree gt 0 as a precondition, so in this
case the base case would need to be something
different.
25Working With Trees
- Assume there was no Size() operation for
Binary_Tree. Could you write it using the other
Kernel operations? - Hint size is 0 only when height is 0. Check
height for the base case, and use recursion on
both sides! - global_function Integer Size (
- preserves Binary_Tree_Of_Text t )
- /!
- ensures
- Size self
- !/
26Size( ), as a global function
global_function_body Integer Size(
preserves Binary_Tree_Of_Text t )?
object Integer result if (t.Height() gt 0)
object catalyst Binary_Tree_Of_Text l, r
object catalyst Text root
t.Decompose(root, l, r) result 1
Size(l) Size(r) t.Compose(root, l,
r) return result
27Thinking About Recursion
- To handle tree recursion, think only one level at
a time. - For our global Size(), we have no requirement
which says t gt 0, so we need to handle the
empty tree case. - However, this makes it easier to write! Now we
can call Size() on any tree. We dont have to
check to see if a tree is empty before we call it
recursively. We just need ensure that any tree
we recurse on is smaller, and subtrees are always
smaller, so this is a cinch. - Notice that all of the actual work in our Size()
is done at the line result 1 Size(l)
Size(r) - The total size is just a bunch of 1 values
added together. Since we recurse down both
sides, this line will run once for every node in
the tree (its skipped in the empty tree case).
28Recursion Metric
- Operations that are written recursively can be
specified with a recursion metric. In Resolve,
we use the word decreases for the recursion
metric, which becomes a part of the contract for
a recursive operation (plus requires and
ensures). - The recursion metric specifies an additional
requirement that must be true before a recursive
call may be made. This guarantees that the
recursion will reach a base case (some point
where it is impossible to make a recursive call).
29Decreases Example
- In Add_Pair_To_Tree, decreases does not mean that
the operation will cause the size of t to shrink
(that wouldnt make any sense). It means you can
only make a recursive call on a smaller t. -
- local_procedure_body Add_Pair_To_Tree (
- alters Binary_Tree_Of_D_R_Pair t,
- consumes D_R_Pair pair )
- /!
- requires
- TREE_CONV (t) and
- pair.d_item is not in D_ELEMENTS (t)
- ensures
- TREE_CONV (t) and
- ELEMENTS (t) ELEMENTS (t) union
pair - decreases
- SIZE (t)
- !/
You may only recurse with (x,p) if x lt t.
30What about iteration?
- The idea behind a recursion metric is simple we
need some way to prove (at least informally) that
the recursion wont run forever. - Similarly, we may want to specify what happens in
a loop in order to ensure that the loop
terminates (among other things). We may put a
decreases clause inside a complicated loop, in
which case the clause specifies what must
decrease at each iteration. Well discuss this
in more detail near the end of the class (with
the topic of loop invariants).
31Tree Traversal
- A tree traversal is a path that visits each node
once (like iterating over entire tree container)? - Examples of depth-first, left-to-right
traversals - Note the way we designed Binary_Tree prevents us
from easily performing a breadth-first traversal.
You should memorize each of these! Since theyre
all left-to-right, the only difference is when
the root is done.
32Traversing A Tree
- Recall that an in-order traversal visits the left
subtree, followed by the root, followed by the
right subtree. - global_procedure Print_In_Order (
- preserves Binary_Tree_Of_Integer t,
- alters Character_OStream out
- )
- /!
- requires
- out.is_open true
- ensures
- out.is_open true and
- out.ext_name out.ext_name and
- out.contents out.contents
- the items in t,
in-order, and - separated by
newlines - !/
Get into groups of 3-4!
33Print_In_Order
global_procedure_body Print_In_Order (
preserves Binary_Tree_Of_Integer t,
alters Character_OStream out )? if
(t.Size() gt 0) object catalyst
Binary_Tree_Of_Integer l, r object
catalyst Integer root t.Decompose(root,
l, r) Print_In_Order(l, out)
out ltlt root ltlt \n Print_In_Order(r,
out) t.Compose(root, l, r) //
How could we turn this into a preorder or
postorder // traversal?
34Neat Algorithm Binary Search
- Im thinking of a number between 1 and 100. Ill
tell you whether its too high or too low when
you guess. - You should be able to get the answer in 7
guesses, every time. (Why 7? 27 128) - A binary search is a method for limiting the
search space by roughly half of the elements at
every step. - It takes O(lg n) time to run binary search very
fast. - You could search 15,000 elements in 14 guesses!
35Binary Search Trees
- A binary search tree (BST) is an ordered binary
tree that maintains a binary search tree
property. Example - For each node in the right subtree,
Are_In_Order(root, node)? and - for each node in the left subtree, not
Are_In_Order(root, node)?. - This defines the BST property that we will use
with Binary_Tree in Lab 4.
36What qualifies as a BST?
- The BST property weve defined is actually rather
complicated. In order for a tree to be a BST,
these four conditions would need to be true - IS_VALID_BST(left_subtree)
- IS_VALID_BST(right_subtree)
- not Are_In_Order(root, MAX_NODE(left_subtree))
- Are_In_Order(root, MIN_NODE(right_subtree))
- E.g., the BST property must apply at every node.
- However, there is one more case the empty tree.
The empty tree is always a BST because it has no
nodes which are not in order.
37Warning Order Matters
- Note that we define the BST property by testing
AIO with the root always in the first position.
This is important because AIO is defined by total
preorder (like ), so equivalence classes return
true. - If you test AIO with the root in the second
position, you will go down the wrong side of the
tree when the item youre testing is
order-equivalent to the one you want.
38Examples of BST (of Integer)?
39Bad BST! No biccie.
40Bad BST! No biccie.
41Why Use a BST?
- Remember the linear search we did in Partial_Map?
With a BST holding the items - accessor, undefine linear ? logarithmic ?
- define, undef. any constant ? logarithmic ?
- destructor, clear still linear (amortized
constant) ? - Although the time required to implement Define
and Undefine_Any increases, overall this is still
an enormous savings! - For 1,000,000 items increases add time by a
factor of 20, but access time drops by a factor
of 50,000!
42Using Tree Traversals
- Remember treetraversals? Thereis a clever use
forone of these whenworking with a BST. - Which order might be useful to to a BST?
- Think about how the BST property is defined what
would each of the traversals do with a BST?
43Limitations of BST
- A binary search tree is best when its mostly
balanced, meaning height is near minimum. - A long, skinny tree is called degenerate. Its
performance would be linear for most operations
(even worse than using a queue for all the
pairs). - This might happen if you add all the items to the
tree when theyre in sorted order, or it might
happen when you add and remove items over time. - Beyond the scope of 222 we often use
self-balancing binary tree models to fix this
issue. - Ex AA, AVL, red-black, scapegoat, splay, treap
44The Downside
- There is a constant cost associated with
composing decomposing (but its tiny). - The algorithms are harder to write
- Searching is easier than linear search, but
requires careful thoughtwe have to test both for
equality and relative order. - We must maintain the BST property every time we
add or remove something from the treethis is
very tricky!
45Designing Partial_Map with BST
- Partial_Map with BST would require only one field
in Rep - a Binary_Tree_Of_D_R_Pair selftree_of_pairs
- But theres a problem that appears when we try to
implement Accessor - Accessor must return a reference to an object,
while leaving it in the Representation (we dont
want to remove it from the container). - The only place we can return a reference from a
Binary_Tree is the root, which is inconvenient.
46How to implement Accessor?
- If we use selftree_of_pairscurrentr_item to
implement Accessor, we would have to move the
D_R_Pair we want to the top of the tree, but
somehow keep the BST property - and we would need to rebuild the whole tree. ?
- We could do it better by using a cache a pair
outside the tree. This adds two fields to Rep - a D_R_Pair selfcache
- an Integer selfnumber_of_pairs
- alternately, we could have chosen to use a
Boolean flag to mark whether the cache is active
47Using a Cache
- Keeping a single cached D_R_Pair outside of the
tree allows us to implement Accessor much more
easily. - If the D_Item you need is in the cache, just
return it the R_Item from selfcacher_item. - Otherwise, remove the pair you need from the
tree, and place it in the cache. - Then we can return the r_item of the cache.
- Dont forget to put the old cached item back into
the tree!
48Caching Also an Optimization
- Objects are often accessed in a certain way that
lets us predict whats next. - spatial locality we often access items based on
how closely they are stored in memory - temporal locality we often access the same items
we just accessed - Caching a single item lets us access it in
constant time after the first access (when
accessing multiple times in a row).
49The Cache Is Never Empty
- The cache is a Record its never empty. If
Rep holds a cached pair that always has something
inside it, how do we represent the empty
Partial_Map? - Solution the cache is not always part of the
Partial_Map! - Understanding the correspondence for PMK7 is
essential to using the cache correctly. It is
not always a valid pair.
50PMK7 Correspondence
correspondence self CORR (
self.number_of_pairs, self.cache,
self.tree_of_pairs )? math definition CORR
( n integer, c TREE_ITEM, t binary
tree of TREE_ITEM ) finite set of TREE_ITEM
satisfies if n 0 then CORR (n, c, t)
else CORR (n, c, t) c union
ELEMENTS (t)?
When selfnumber_of_pairs 0, the cache is not
part of the Partial_Map! (Its just detritus.)
Otherwise, the cache is part of the
Partial_Map. Its never a copy of a pair found
in the tree. (Please read the correspondence
over and over repeatedly until this sinks in.)
51Restrictions on D_Item types
- Remember that AIO is modeled by total preorder?
- Are_In_Order(x, y) and Are_In_Order(y, x) does
not imply x y - Example Text objects ordered by x.Length() ?
y.Length() - So foo is in order with bar, and vice versa,
but foo ? bar. - So D_Item must have two utility classes
- D_Item_Are_In_OrderAre_In_Order(x, y)
- x.Is_Equal_To(y) (for D_Item)
52Simple Operations on BST
- Locating an item
- Use BST property to choose which side to recurse
down - Base case is when you either find the item at the
root, or you reach an empty tree - Adding an item
- Use BST property to choose which side to recurse
down - Base case is the empty tree add item there
53Locating an Item
54Locating an Item
55Inserting an Item
56Inserting an Item
57Removing an Item
- Case 1 removing a leaf
- left ? ? right ?
- easy to dono repairs needed (t becomes ?)
- Case 2 removing a node with one child
- left ? ? right ?
- theres a trick that makes this one easy
- Case 3 removing a node with two children
- (left ? ? right ?)?
- this takes some serious thinking
58Removing a Leaf
59Removing a Leaf
60Removing a Left-child Node
61Removing a Left-child Node
62Removing a Left-child Node
63How to accomplish this magic
- How can we move up the subtree?
- t.Decompose(root, l, r)
- root x
- if (l.Size() 0)
- t r
- else
- // other cases here
-
- Notice this also takes care of the leaf case!
(Why?)?
Since we just decomposed t, t is the empty tree
here. Were replacing it with its right subtree.
64Trouble in Paradise
- Whats (seriously) wrong with this code?
- (Trace what happens to t if its only non-empty
subtree is r.) - t.Decompose(root, l, r)
- root x
- if (l.Size() 0)
- t r
- else
- // other cases here
-
- t.Compose(root, l, r)
65Destroying the Tree
- Compose produces t!
- (We just destroyed the subtree we swapped in.)
- t.Decompose(root, l, r)
- root x
- if (l.Size() 0)
- t r
- else
-
- t.Compose(root, l, r)
We cant always rely on Compose to patch the tree
up for us if we want to alter the tree. Here, we
had already fixed the tree by calling t r.
We need to restrict the Compose callonly to
cases that need that call.
move into cases that need it
66Removing a Right-child Node
67Removing a Right-child Node
68Removing a Right-child Node
69An Interesting Property
- Where is the largest and smallest number in each
of the graphs below? - Could we use this information to make it easier
to remove min or max elements?
70Removing Largest
- Since the largest element never has a right
subtree, it should be easy to remove the largest
and fix the tree. - Get into groups of three or four.
- global_procedure Remove_Max (
- alters Binary_Tree_Of_Integer t,
- produces Integer max )
- /!
- requires
- t gt 0 and
- BST_PROPERTY(t)?
- ensures
- BST_PROPERTY(t) and
- ELEMENTS(t) ELEMENTS(t) max and
- for all Item y in t
- ARE_IN_ORDER(y,max)?
- !/
71Remove_Max()?
global_procedure_body Integer Remove_Max(
alters Binary_Tree_Of_Integer t,
produces Integer max )? object
catalyst Integer root object catalyst
Binary_Tree_Of_Integer left, right
t.Decompose(root, left, right) if
(right.Size() 0) max root
t left else Remove_Max(right,
max) t.Compose(root, left, right)
72Now, Two Children
- Fixing a tree after removing a node with two
children becomes a non-trivial problem - One option remove the subtrees and add them back
to the tree one node at a time - Boo creepy foot doctor! Extremely inefficient!
- We need to look for a simpler way
73Remember the BST Property
- If we remove a node that has two subtrees, what
possible candidates could replace that node? - The previous item in-order will be the rightmost
(maximal) item in the left subtree. - The next item in-order will be the leftmost
(minimal) item in the right subtree. - One of these can mess up duplicates
74Removing, Hard Case
75Removing, Hard Case
At first glance, it seems like either strategy
will work.
76Removing, Hard Case
77Removing, Hard Case
78Removing, Hard Case
79Removing, Hard Case
80Removing, Hard Case
81Removing, Hard Case
We see that using the previous node in-order can
lead to a corruption of the BST, due
to equivalent items.
82Removing, Hard Case
This problem cant occur if we take the smallest
from right, because it will keep equivalent items
in their right subtree.
83Removing, Hard Case
84Removing Two-Child Nodes
- Fortunately, we can remove a two-child node
without recursing down to the bottom of the tree. - The minimal-order item in a BST cant have a left
subtree (otherwise, there would be a more minimal
item). - If the item has no left subtree, it can be
repaired without going further down the tree (a
constant-time tree swap). - This means that the Remove operation can patch
the tree in O(lg n) time, so it still runs in
O(lg n) time overall. Sweet.
85Thinking about PMK7Remove()
- Break the idea down into steps.
- Consider the cache
- Since Remove requires that some particular item
is in the map, that means map gt 0, so we know
if the cache is valid. - Is it not in the cache? It must be in the tree
- To search recursively, we may need a helper
operation. - The helper operation might need to patch the
hole. - The local operations Remove_Pair_From_Tree and
Remove_Smallest_PFT are your friends! - Think carefully about how to build them.
- You might need one of them to get the other one
to work!
86More Practice
- Cant ever get enough of tree recursion! ?
- global_procedure Remove_Lowest(
- alters Binary_Tree_Of_Item t,
- produces Item x)
- /!
- requires
- t gt 0
- ensures
- for all i Item path from root to i lt
- path from root to x
- and CONTENTS(t) CONTENTS(t) x
- !/
87Remove_Lowest
procedure_body Remove_Lowest (alters
Binary_Tree_Of_Item t, produces Item x)
object Item root object Binary_Tree_Of_Item
l, r t.Decompose(root, l, r) if
(l.Height() 0 r.Height() 0) x
root else if l.Height() gt r.Height())
Remove_Lowest(l, x)
t.Compose(root, l, r) else
Remove_Lowest(r, x) t.Compose(root, l,
r)
88Thinking about Get_Tree
- The spec for Get_Tree() in Closed Lab 6 is
interesting! Suppose we have an implementation
that works for it - /!  requires there exists t1 binary tree of
character (PREFIX_DISPLAY (t1)
is prefix of tree_as_text) - ensures tree_as_text PREFIX_DISPLAY (t)
tree_as_text !/ - Then we change the spec slightly. Is our code
still correct for this new spec that lacks the
extra stuff part? Why not? - /!  requires there exists t1 binary tree of
character (PREFIX_DISPLAY (t1)
tree_as_text) - ensures tree_as_text PREFIX_DISPLAY (t)
!/ - But wont it work if we merely weaken the
postcondition? - /!  requires there exists t1 binary tree of
character (PREFIX_DISPLAY (t1)
is prefix of tree_as_text) - ensures PREFIX_DISPLAY (t) is prefix of
tree_as_text !/
This version consumes tree_as_text.
89Correctness in Get_Tree()
- What does it mean for code to be correct?
- If we spec Get_Tree() without the extra stuff,
then we cant call it recursively, because if
when we call it on t to get the left subtree, it
will clear t, so we lose the right subtree
entirely. - We could write a second helper operation to do
the recursion, but it would have the same issue. - You might think that the weaker version must work
because all were doing is allowing more results - But the code would not be correct. When you make
the recursive calls, you could no longer assume
that the tree had actually been removed from t.
(This is a bizarre sort of example of a
concrete-to-concrete dependency between the same
code and itself!)
90Correctness ? Good Output
- The underlying idea is simple when we say that
code is correct, we mean two things - The code must do what its ensures says it does,
but - this what it does is determined based on the
ensures of the operations our code calls, not on
how those operations are written, or what the
output happens to be! - It is therefore possible to write code that
always works, but is not correct. Most often
this code works because the right components
are stuck together, but the same code could fail
if you chose other components. - Well examine this in more detail later on in the
course.