Binary Trees our leafy, annoying friends

About This Presentation

Title:

Binary Trees our leafy, annoying friends

Description:

Order Properties ( a, b, c)? Basic properties for certain order-type relations: ... a AIO b and b AIO a does not mean a = b. So we might also need Is_Equal_To! ... – PowerPoint PPT presentation

Number of Views:107

Avg rating:3.0/5.0

Slides: 91

Provided by: awo4

Category:

more less

Transcript and Presenter's Notes

Title: Binary Trees our leafy, annoying friends

1
Binary Treesour leafy, annoying friends
Annatala Wolf 222 Lecture 6
2
Relations (Order Properties)

A binary relation is a set of pairs. We can
classify relations based on properties that are
always true. Examples of relations
reflexivity a R a
irreflexivity ?(a R a)?
transitivity a R b ? b R c ? a R c
symmetry a R b ? b R a
antisymmetry a R b ? b R a ? a b
totality a R b?? b R a

3
Order Relations (Examples)

Equivalence reflexive, symmetric, and transitive
examples ceiling and floor functions?, true
equality
Preorder (simplest kind of order) reflexive and
transitive
examples human preferences, logical implication?
Total preorder transitive and total (?
reflexive)
examples like preorder, but each pair is
comparable
Partial order reflexive, antisymmetric,
transitive
example strings by length, integers by rightmost
digit
Total order antisymmetric, transitive, and total
(?reflexive)?
examples on the integers, increasing lexicogr.
ASCII

4
Are_In_Order

Utility class used for comparing Item
Mathematically modeled by total preorder
preorder so that it can handle any kind of
ordering situation, even loose orderings, such
as ordering strings by length
total to ensure that Are_In_Order always has some
correct way to order two items (cant be
non-ordered in both directions)

5
Note the Difference

Because Are_In_Order is reflexive (this is
implied by transitive total), it catches
equality as true
Are_In_Order is not necessarily antisymmetric
(a AIO b) and (b AIO a) does not imply a b
Consider ordering Text by length foo ? bar
So we might also need Is_Equal_To.
You must take special care how AIO is used. It
acts somewhat similar to
Namely, !(x y) is not the same as y x

6
Using Are_In_Order Correctly

To ensure Are_In_Order is used properly, you must
always test in the same direction
Just like using , you want to test
base case Are_In_Order(x, y)?
opposite case not Are_In_Order(x, y)?
bad idea! Are_In_Order(y, x)?
Well see this in more detail when we look at
using binary search trees

7
Seven Bridges of Königsberg

Classic problem from the 1700s could you cross
each bridge in Königsberg exactly once, in a
single path?
This seems like a question math could answerbut
at the time, math had not yet been applied to
this sort of physical relationship.

Euler (same fellow who named the constant e)
proved in 1735 that it was impossible to do this.
8
Graph Theory

Euler proved this by inventing a new form of math
called graph theory, that studies connections.
Later, it developed into a related field,
topology.

Euler noticed all 4 vertices have an odd degree
( of edges)...
Each region is a vertex
But unless a vertex is at the start or end, it
must have an even degree (one edge in one edge
out). A contradiction!
each bridge is an edge.
9
Graphs

Graph a set of vertices (or nodes) connected by
edges
Each edge links two vertices, or one vertex to
itself a self-loop (self-edge)
In some graphs, the edges may be directed (point
one way) or havevalues, i.e., color, order,
weight
An empty graph has zeronodes and zero edges
Path a sequence of vertices connected by their
edges

10
Paths and Cycles

A simple path is a path with no repeated
vertices. (In the undirected graph below, there
are infinitely many paths from 5 to 6, but only
three simple paths.)
A cycle is a path that starts and ends at the
same vertex. If that vertex is the only repeated
one, its a simple cycle. (There are six simple
cycles from 5.)
In directed graphs, you canonly travel in the
directionof the arrows. But edges canstill go
in both directions.

11
Graphs in Computer Science

Graphs are extremely useful in CS! We usually
use them as data structures to hold information,
or to solve problems.
Graphs can be used to do many things
to model how data moves through networks
to hold data and relationships together
to find the quickest solution to a problem
involving many separate pieces
to describe how program components talk

12
Trees

Trees are a common, useful graph structure in CS.
Trees have several equivalent definitions.
A tree is any undirected graph where
it is also a connected (every vertex can reach
every other vertex), acyclic graph (no cycles
means no loops of any size).
or, every two vertices are connected by exactly
one simple path.
or, it is either empty (zero nodes, zero edges)
or else it is a connected graph with n vertices
and n-1 edges (for some number n).

13
Types of Trees

A rooted tree is simply a tree with one vertex
specially marked as the root.
We call this vertex the root, and now think of
edges being directed to or from the root (either
way is fine).
An empty tree (no vertices or edges) can still be
rooted, technically.
An ordered tree is a tree where the edges for
each node have an order associated with them.
Often numbered 0, 1, 2 (Or, left, right for
binary.)
A forest is a graph that is a set of trees.
Usually this is just an undirected, acyclic graph!

14
Visualizing Rooted Trees

When we draw rooted trees, we often use the
following common conventions
Draw the root of the tree at the top
Draw vertices horizontally level based on how
many edges they are from the root
If the tree is ordered, draw the outgoing (child)
edges from left to right in that order
lt0, 1, 2gt (if ordered by number)
ltleft, rightgt (if binary-ordered)

15
Rooted Tree Terminology
Size total number of nodes. Height total
number of levels (max distance from the root).
Leaves are nodes without children. Nodes with
children are called internal nodes.
16
Trees in Computer Science

Trees are a useful model for recursive data
structures.
the directory system for your computer uses a
rooted tree
networks like Ethernet can be trees
trees can be used to store order or relationship
information between items
When used as a data structure, each item is
usually a vertex in the tree.

17
Binary Trees

A binary tree is a rooted tree where every node
has at most two children.
Binary trees are also often ordered. This means
there is a distinct left child and right child.
If we consider the empty tree to be the child
for a node missing a left or right outgoing edge,
we can say that every node has exactly two
children (unless it is empty)?.
This definition is very useful to the recursive
structure of the tree!

18
Recursive Structure

Think of each binary tree as either being empty,
or else consisting of a root node, a left
subtree, and a right subtree.

(subtrees might still be empty)
ø
OR
empty tree (0 nodes, 0 edges)
non-empty tree
19
What We Gain

We no longer have special cases for nodes based
on whether they have left, right, both, or
neither child. All nodes now have exactly two
subtrees!

A
A
B
ø
B
D
C
D
C
ø
ø
E
ø
E
ø
ø
20
Thinking Recursively

By defining binary trees recursively, its much
easier to work with them. The only special case
we have to handle is the empty tree case, so the
interface can be simpler.

Our only special case. Often (but not always), we
can use this as our recursive base case.
ø
empty tree (0 nodes, 0 edges)
non-empty tree
21
Binary_Tree

Intuitively, Binary_Tree is a container with one
base item (the root), and the rest of the items
partitioned to the left or right of the root.
You can only access the root, or break the tree
into the root and its two subtrees.
Mathematically, Binary_Tree is an ordered binary
tree of Item.
default value the empty tree

22
Binary_Tree Kernel Operations

Compose (x, left_subtree, right_subtree)?
requires true produces self
Decompose (x, left_subtree, right_subtree)?
requires self???? consumes self
Accessor current
requires self????
Height ( )?
requires true
Size ( )?
requires true

23
Using Binary_Tree

Anything you do using Binary_Tree will be
recursive. Period.
The only way to work with Binary_Tree is to
handle one height-level of the tree at a time.
If you ever decompose a subtree (decompose a
tree, and then do it to its subtree too), youre
doing it wrong! This should never happen.
You must rely upon recursion to do the work for
you at lower levels! Binary tree has a recursive
structure, and only recursion will correctly
exploit it.

24
Preconditions for Binary_Tree

The only preconditions for Binary_Tree are
You cannot Decompose() the empty tree.
You cannot use current on the empty tree.
Fortunately, the empty tree is easy to recognize.
It is the only tree that has a size (and a
height) of zero.
Frequently, the empty tree will be the base case
for recursionbut not always! Some operations
have tree gt 0 as a precondition, so in this
case the base case would need to be something
different.

25
Working With Trees

Assume there was no Size() operation for
Binary_Tree. Could you write it using the other
Kernel operations?
Hint size is 0 only when height is 0. Check
height for the base case, and use recursion on
both sides!
global_function Integer Size (
preserves Binary_Tree_Of_Text t )
/!
ensures
Size self
!/

26
Size( ), as a global function
global_function_body Integer Size(
preserves Binary_Tree_Of_Text t )?
object Integer result if (t.Height() gt 0)
object catalyst Binary_Tree_Of_Text l, r
object catalyst Text root
t.Decompose(root, l, r) result 1
Size(l) Size(r) t.Compose(root, l,
r) return result
27
Thinking About Recursion

To handle tree recursion, think only one level at
a time.
For our global Size(), we have no requirement
which says t gt 0, so we need to handle the
empty tree case.
However, this makes it easier to write! Now we
can call Size() on any tree. We dont have to
check to see if a tree is empty before we call it
recursively. We just need ensure that any tree
we recurse on is smaller, and subtrees are always
smaller, so this is a cinch.
Notice that all of the actual work in our Size()
is done at the line result 1 Size(l)
Size(r)
The total size is just a bunch of 1 values
added together. Since we recurse down both
sides, this line will run once for every node in
the tree (its skipped in the empty tree case).

28
Recursion Metric

Operations that are written recursively can be
specified with a recursion metric. In Resolve,
we use the word decreases for the recursion
metric, which becomes a part of the contract for
a recursive operation (plus requires and
ensures).
The recursion metric specifies an additional
requirement that must be true before a recursive
call may be made. This guarantees that the
recursion will reach a base case (some point
where it is impossible to make a recursive call).

29
Decreases Example

In Add_Pair_To_Tree, decreases does not mean that
the operation will cause the size of t to shrink
(that wouldnt make any sense). It means you can
only make a recursive call on a smaller t.
local_procedure_body Add_Pair_To_Tree (
alters Binary_Tree_Of_D_R_Pair t,
consumes D_R_Pair pair )
/!
requires
TREE_CONV (t) and
pair.d_item is not in D_ELEMENTS (t)
ensures
TREE_CONV (t) and
ELEMENTS (t) ELEMENTS (t) union
pair
decreases
SIZE (t)
!/

You may only recurse with (x,p) if x lt t.
30
What about iteration?

The idea behind a recursion metric is simple we
need some way to prove (at least informally) that
the recursion wont run forever.
Similarly, we may want to specify what happens in
a loop in order to ensure that the loop
terminates (among other things). We may put a
decreases clause inside a complicated loop, in
which case the clause specifies what must
decrease at each iteration. Well discuss this
in more detail near the end of the class (with
the topic of loop invariants).

31
Tree Traversal

A tree traversal is a path that visits each node
once (like iterating over entire tree container)?
Examples of depth-first, left-to-right
traversals
Note the way we designed Binary_Tree prevents us
from easily performing a breadth-first traversal.

You should memorize each of these! Since theyre
all left-to-right, the only difference is when
the root is done.
32
Traversing A Tree

Recall that an in-order traversal visits the left
subtree, followed by the root, followed by the
right subtree.
global_procedure Print_In_Order (
preserves Binary_Tree_Of_Integer t,
alters Character_OStream out
)
/!
requires
out.is_open true
ensures
out.is_open true and
out.ext_name out.ext_name and
out.contents out.contents
the items in t,
in-order, and
separated by
newlines
!/

Get into groups of 3-4!
33
Print_In_Order
global_procedure_body Print_In_Order (
preserves Binary_Tree_Of_Integer t,
alters Character_OStream out )? if
(t.Size() gt 0) object catalyst
Binary_Tree_Of_Integer l, r object
catalyst Integer root t.Decompose(root,
l, r) Print_In_Order(l, out)
out ltlt root ltlt \n Print_In_Order(r,
out) t.Compose(root, l, r) //
How could we turn this into a preorder or
postorder // traversal?
34
Neat Algorithm Binary Search

Im thinking of a number between 1 and 100. Ill
tell you whether its too high or too low when
you guess.
You should be able to get the answer in 7
guesses, every time. (Why 7? 27 128)
A binary search is a method for limiting the
search space by roughly half of the elements at
every step.
It takes O(lg n) time to run binary search very
fast.
You could search 15,000 elements in 14 guesses!

35
Binary Search Trees

A binary search tree (BST) is an ordered binary
tree that maintains a binary search tree
property. Example
For each node in the right subtree,
Are_In_Order(root, node)? and
for each node in the left subtree, not
Are_In_Order(root, node)?.
This defines the BST property that we will use
with Binary_Tree in Lab 4.

36
What qualifies as a BST?

The BST property weve defined is actually rather
complicated. In order for a tree to be a BST,
these four conditions would need to be true
IS_VALID_BST(left_subtree)
IS_VALID_BST(right_subtree)
not Are_In_Order(root, MAX_NODE(left_subtree))
Are_In_Order(root, MIN_NODE(right_subtree))
E.g., the BST property must apply at every node.
However, there is one more case the empty tree.
The empty tree is always a BST because it has no
nodes which are not in order.

37
Warning Order Matters

Note that we define the BST property by testing
AIO with the root always in the first position.
This is important because AIO is defined by total
preorder (like ), so equivalence classes return
true.
If you test AIO with the root in the second
position, you will go down the wrong side of the
tree when the item youre testing is
order-equivalent to the one you want.

38
Examples of BST (of Integer)?
39
Bad BST! No biccie.
40
Bad BST! No biccie.
41
Why Use a BST?

Remember the linear search we did in Partial_Map?
With a BST holding the items
accessor, undefine linear ? logarithmic ?
define, undef. any constant ? logarithmic ?
destructor, clear still linear (amortized
constant) ?
Although the time required to implement Define
and Undefine_Any increases, overall this is still
an enormous savings!
For 1,000,000 items increases add time by a
factor of 20, but access time drops by a factor
of 50,000!

42
Using Tree Traversals

Remember treetraversals? Thereis a clever use
forone of these whenworking with a BST.
Which order might be useful to to a BST?
Think about how the BST property is defined what
would each of the traversals do with a BST?

43
Limitations of BST

A binary search tree is best when its mostly
balanced, meaning height is near minimum.
A long, skinny tree is called degenerate. Its
performance would be linear for most operations
(even worse than using a queue for all the
pairs).
This might happen if you add all the items to the
tree when theyre in sorted order, or it might
happen when you add and remove items over time.
Beyond the scope of 222 we often use
self-balancing binary tree models to fix this
issue.
Ex AA, AVL, red-black, scapegoat, splay, treap

44
The Downside

There is a constant cost associated with
composing decomposing (but its tiny).
The algorithms are harder to write
Searching is easier than linear search, but
requires careful thoughtwe have to test both for
equality and relative order.
We must maintain the BST property every time we
add or remove something from the treethis is
very tricky!

45
Designing Partial_Map with BST

Partial_Map with BST would require only one field
in Rep
a Binary_Tree_Of_D_R_Pair selftree_of_pairs
But theres a problem that appears when we try to
implement Accessor
Accessor must return a reference to an object,
while leaving it in the Representation (we dont
want to remove it from the container).
The only place we can return a reference from a
Binary_Tree is the root, which is inconvenient.

46
How to implement Accessor?

If we use selftree_of_pairscurrentr_item to
implement Accessor, we would have to move the
D_R_Pair we want to the top of the tree, but
somehow keep the BST property
and we would need to rebuild the whole tree. ?
We could do it better by using a cache a pair
outside the tree. This adds two fields to Rep
a D_R_Pair selfcache
an Integer selfnumber_of_pairs
alternately, we could have chosen to use a
Boolean flag to mark whether the cache is active

47
Using a Cache

Keeping a single cached D_R_Pair outside of the
tree allows us to implement Accessor much more
easily.
If the D_Item you need is in the cache, just
return it the R_Item from selfcacher_item.
Otherwise, remove the pair you need from the
tree, and place it in the cache.
Then we can return the r_item of the cache.
Dont forget to put the old cached item back into
the tree!

48
Caching Also an Optimization

Objects are often accessed in a certain way that
lets us predict whats next.
spatial locality we often access items based on
how closely they are stored in memory
temporal locality we often access the same items
we just accessed
Caching a single item lets us access it in
constant time after the first access (when
accessing multiple times in a row).

49
The Cache Is Never Empty

The cache is a Record its never empty. If
Rep holds a cached pair that always has something
inside it, how do we represent the empty
Partial_Map?
Solution the cache is not always part of the
Partial_Map!
Understanding the correspondence for PMK7 is
essential to using the cache correctly. It is
not always a valid pair.

50
PMK7 Correspondence
correspondence self CORR (
self.number_of_pairs, self.cache,
self.tree_of_pairs )? math definition CORR
( n integer, c TREE_ITEM, t binary
tree of TREE_ITEM ) finite set of TREE_ITEM
satisfies if n 0 then CORR (n, c, t)
else CORR (n, c, t) c union
ELEMENTS (t)?
When selfnumber_of_pairs 0, the cache is not
part of the Partial_Map! (Its just detritus.)
Otherwise, the cache is part of the
Partial_Map. Its never a copy of a pair found
in the tree. (Please read the correspondence
over and over repeatedly until this sinks in.)
51
Restrictions on D_Item types

Remember that AIO is modeled by total preorder?
Are_In_Order(x, y) and Are_In_Order(y, x) does
not imply x y
Example Text objects ordered by x.Length() ?
y.Length()
So foo is in order with bar, and vice versa,
but foo ? bar.
So D_Item must have two utility classes
D_Item_Are_In_OrderAre_In_Order(x, y)
x.Is_Equal_To(y) (for D_Item)

52
Simple Operations on BST

Locating an item
Use BST property to choose which side to recurse
down
Base case is when you either find the item at the
root, or you reach an empty tree
Adding an item
Use BST property to choose which side to recurse
down
Base case is the empty tree add item there

53
Locating an Item
54
Locating an Item
55
Inserting an Item
56
Inserting an Item
57
Removing an Item

Case 1 removing a leaf
left ? ? right ?
easy to dono repairs needed (t becomes ?)
Case 2 removing a node with one child
left ? ? right ?
theres a trick that makes this one easy
Case 3 removing a node with two children
(left ? ? right ?)?
this takes some serious thinking

58
Removing a Leaf
59
Removing a Leaf
60
Removing a Left-child Node
61
Removing a Left-child Node
62
Removing a Left-child Node
63
How to accomplish this magic

How can we move up the subtree?
t.Decompose(root, l, r)
root x
if (l.Size() 0)
t r
else
// other cases here
Notice this also takes care of the leaf case!
(Why?)?

Since we just decomposed t, t is the empty tree
here. Were replacing it with its right subtree.
64
Trouble in Paradise

Whats (seriously) wrong with this code?
(Trace what happens to t if its only non-empty
subtree is r.)
t.Decompose(root, l, r)
root x
if (l.Size() 0)
t r
else
// other cases here
t.Compose(root, l, r)

65
Destroying the Tree

Compose produces t!
(We just destroyed the subtree we swapped in.)
t.Decompose(root, l, r)
root x
if (l.Size() 0)
t r
else
t.Compose(root, l, r)

We cant always rely on Compose to patch the tree
up for us if we want to alter the tree. Here, we
had already fixed the tree by calling t r.
We need to restrict the Compose callonly to
cases that need that call.
move into cases that need it
66
Removing a Right-child Node
67
Removing a Right-child Node
68
Removing a Right-child Node
69
An Interesting Property

Where is the largest and smallest number in each
of the graphs below?
Could we use this information to make it easier
to remove min or max elements?

70
Removing Largest

Since the largest element never has a right
subtree, it should be easy to remove the largest
and fix the tree.
Get into groups of three or four.
global_procedure Remove_Max (
alters Binary_Tree_Of_Integer t,
produces Integer max )
/!
requires
t gt 0 and
BST_PROPERTY(t)?
ensures
BST_PROPERTY(t) and
ELEMENTS(t) ELEMENTS(t) max and
for all Item y in t
ARE_IN_ORDER(y,max)?
!/

71
Remove_Max()?
global_procedure_body Integer Remove_Max(
alters Binary_Tree_Of_Integer t,
produces Integer max )? object
catalyst Integer root object catalyst
Binary_Tree_Of_Integer left, right
t.Decompose(root, left, right) if
(right.Size() 0) max root
t left else Remove_Max(right,
max) t.Compose(root, left, right)

72
Now, Two Children

Fixing a tree after removing a node with two
children becomes a non-trivial problem
One option remove the subtrees and add them back
to the tree one node at a time
Boo creepy foot doctor! Extremely inefficient!
We need to look for a simpler way

73
Remember the BST Property

If we remove a node that has two subtrees, what
possible candidates could replace that node?
The previous item in-order will be the rightmost
(maximal) item in the left subtree.
The next item in-order will be the leftmost
(minimal) item in the right subtree.
One of these can mess up duplicates

74
Removing, Hard Case
75
Removing, Hard Case
At first glance, it seems like either strategy
will work.
76
Removing, Hard Case
77
Removing, Hard Case
78
Removing, Hard Case
79
Removing, Hard Case
80
Removing, Hard Case
81
Removing, Hard Case
We see that using the previous node in-order can
lead to a corruption of the BST, due
to equivalent items.
82
Removing, Hard Case
This problem cant occur if we take the smallest
from right, because it will keep equivalent items
in their right subtree.
83
Removing, Hard Case
84
Removing Two-Child Nodes

Fortunately, we can remove a two-child node
without recursing down to the bottom of the tree.
The minimal-order item in a BST cant have a left
subtree (otherwise, there would be a more minimal
item).
If the item has no left subtree, it can be
repaired without going further down the tree (a
constant-time tree swap).
This means that the Remove operation can patch
the tree in O(lg n) time, so it still runs in
O(lg n) time overall. Sweet.

85
Thinking about PMK7Remove()

Break the idea down into steps.
Consider the cache
Since Remove requires that some particular item
is in the map, that means map gt 0, so we know
if the cache is valid.
Is it not in the cache? It must be in the tree
To search recursively, we may need a helper
operation.
The helper operation might need to patch the
hole.
The local operations Remove_Pair_From_Tree and
Remove_Smallest_PFT are your friends!
Think carefully about how to build them.
You might need one of them to get the other one
to work!

86
More Practice

Cant ever get enough of tree recursion! ?
global_procedure Remove_Lowest(
alters Binary_Tree_Of_Item t,
produces Item x)
/!
requires
t gt 0
ensures
for all i Item path from root to i lt
path from root to x
and CONTENTS(t) CONTENTS(t) x
!/

87
Remove_Lowest
procedure_body Remove_Lowest (alters
Binary_Tree_Of_Item t, produces Item x)
object Item root object Binary_Tree_Of_Item
l, r t.Decompose(root, l, r) if
(l.Height() 0 r.Height() 0) x
root else if l.Height() gt r.Height())
Remove_Lowest(l, x)
t.Compose(root, l, r) else
Remove_Lowest(r, x) t.Compose(root, l,
r)
88
Thinking about Get_Tree

The spec for Get_Tree() in Closed Lab 6 is
interesting! Suppose we have an implementation
that works for it
/! requires there exists t1 binary tree of
character (PREFIX_DISPLAY (t1)
is prefix of tree_as_text)
ensures tree_as_text PREFIX_DISPLAY (t)
tree_as_text !/
Then we change the spec slightly. Is our code
still correct for this new spec that lacks the
extra stuff part? Why not?
/! requires there exists t1 binary tree of
character (PREFIX_DISPLAY (t1)
tree_as_text)
ensures tree_as_text PREFIX_DISPLAY (t)
!/
But wont it work if we merely weaken the
postcondition?
/! requires there exists t1 binary tree of
character (PREFIX_DISPLAY (t1)
is prefix of tree_as_text)
ensures PREFIX_DISPLAY (t) is prefix of
tree_as_text !/

This version consumes tree_as_text.
89
Correctness in Get_Tree()

What does it mean for code to be correct?
If we spec Get_Tree() without the extra stuff,
then we cant call it recursively, because if
when we call it on t to get the left subtree, it
will clear t, so we lose the right subtree
entirely.
We could write a second helper operation to do
the recursion, but it would have the same issue.
You might think that the weaker version must work
because all were doing is allowing more results
But the code would not be correct. When you make
the recursive calls, you could no longer assume
that the tree had actually been removed from t.
(This is a bizarre sort of example of a
concrete-to-concrete dependency between the same
code and itself!)

90
Correctness ? Good Output

The underlying idea is simple when we say that
code is correct, we mean two things
The code must do what its ensures says it does,
but
this what it does is determined based on the
ensures of the operations our code calls, not on
how those operations are written, or what the
output happens to be!
It is therefore possible to write code that
always works, but is not correct. Most often
this code works because the right components
are stuck together, but the same code could fail
if you chose other components.
Well examine this in more detail later on in the
course.