Title: Trees General principles Ways of thinking
1TreesGeneral principlesWays of thinking
- Chapter 17 18 in DSPS
- Chapter 4 in DSAA
2Applications
- Coding
- Huffman, prefix
- Parsing/Compiling
- tree is standard internal representation for code
- Information Storage/Retrieval
- binary trees, AA-trees, AVL, Red-Black, Splay
- Game-Playing (Scenario analysis)
- virtual trees
- alpha-beta search
- Decision Trees
- representation of choices
- automatically constructed from data
3General Trees
- Tree Definition
- distinguished root node
- all other nodes have unique, sole parent
- Depth of a node
- number of edges from root to node
- Height of a node
- number of edges from node to deepest descendant
- Balanced
- Goal O(log n) insert/delete/find
- height of any sons of any node differs by less
than 1 (k) - K-arity
- nodes have at most k sons
4Depth of a Node
0
1
1
1
2
2
2
Often convenient to add another field to node
structure for additional information such as
depth, height, visited, cost, father, number of
visits, number of nodes below, etc.
5Height of a Node
3
2
1
0
0
0
0
0
1
0
0
6Simple Relationships
- Leaf height is 0
- Height of a node is 1maximum height of sons
- Root depth is 0
- Depth of a node is 1 depth of father
- These can be computed recursively.
7Three Tree Representations
- List (variable number of children)
- son representation
- Object value
- NodeList children
- Sibling (variable number of children)
- Sibling representation
- Object value
- Node child // the leftmost child
- Node sibling // each node points
- Array (k is bound on number of children)
- Object value
- Nodek children
8Sibling Representation
a
d
b
c
d
e
f
a
d
c
b
e
f
d
9Depth of node (list rep)
- Recall depth(node) is number of links form node
to root. - Idea
- depth of sons is 1 depth of father
- call depth(root, 0)
- Define depth(node n,int d)
- mark depth at node n d
- for each son of n, call
depth(son,d1) (use iterator) - Marking can be done in two ways
- have an addition field (int depth) for each node
- have an array int depthnumber of nodes
-
-
10Depth of node (sibling rep)
- Compute the depth of a node
- Recall depth(node) is number of links form node
to root. - Idea
- depth of left son is 1 depth of father
- depth of siblings is same as depth of father
- Call depth(root, 0)
- Define depth(node n, int d)
- mark depth at node n as d
- call depth(n.leftson,d1)
- call depth(n.sibling, d)
-
-
11Height of Node
- List representation
- if node is leaf, height 0
- else height 1 max(height of sons)
- Sibling representation
- if node is leaf, height 0
- else height max (1 height of leftson, max of
heights of siblings)
12Virtual Trees
- Trees are often conceptual objects, but take too
much room to store. Store only what is needed. - Representation
- Node
- object value
- Node nextSon() returns null if no more sons,
else returns the next son - In this representation you generate sons on the
fly - E.G. in game playing typically only store depth
of tree nodes.
13Standard Operations
- Copying
- Traversals
- preorder, inorder, postorder, level-order
- illustrated with printing, but any processing ok
- Find (Object o)
- Insertion(Object o)
- Deletion(Object o)
- Complexity of these operations varies with
constraints / structure of tree that must be
preserved.
14Binary Trees
- Object Representation node has
- Object value
- Node left, right
- Array Representation
- use Object
- requires you know size of tree, or use growable
arrays - no pointer overhead
- Trick if node is stored at i, then
- left son stored at 2i
- right son stored at 2i1
- root stored at 1
- father of node i is at i/2
- Generalizes to k-ary trees naturally.
15Binary Search Trees
- Left
- i.e. any descendant of a node in left is less
than any descendant of a node in right. - Operations let d be depth of tree
- object find(key k)
- sometimes key and object are the same
- insert(object o) or insert(key k, object o)
- Object findMin()
- removeMin()
- removeElement(object o)
- Cost all O(d) via separate and conquer
16Removing elements is tricky
- How would you remove value at root?
- Plan for remove(object o)
- 1. Find o, i.e. let n be node in tree with value
o - 2. Keep a ptr to the father of n
- 3. If ( n.right null) ptr.son n.left // not
code - 4. Else
- a. find min in n.right
- b. remove min from n.right
- c. ptr.son new node(min, n.left, n.right)
- Assumes appropriate constructor.
- Make pictures of the cases.
17Support routines
- BinaryNode findMin(BinaryNode n)
- Recursively
- if (n.left null) return n
- else return left.findMin()
- O(d) Time and Space
- BinaryNode findMin(BinaryNode n)
- Iteratively
- while ( n.left !null) n n.left
- return n
- O(d) Time, O(1) space
18Remove Min
- removeMin(BinaryNode n) idea
- Node n n.findMin()
- father(n).right n.right
- // idea ok, code not right
- What if minimum is root?
- BinaryNode removeMin(BinaryNode n)
- if (n.left ! null)
- n.left removeMin(n.left)
- else
- n n.right
- return n
19Min remove Examples
20Remove Node Examples
a
b
c
d
e
f
g
21removeNode
- BinaryNode removeNode(BinaryNode x, BinaryNode n)
// remove x from n - if (x
- else if (xn) n.rightremoveNode(x, n.right)
- // Now x n
- else if (n.left ! null n.right !null)
- n.data findMin(n.right).data
- n.right removeMin(n.right)
- else (// left or right is empty)
- n (n.left ! null) ? N.left n.right
- return n
22Find a node (three meanings)
- Search tree
- given a node id, find id in tree.
- Search tree
- find a node with a specific property, e.g.
- kth largest element (Order Statistic)
- Separate and conquer answers in log(n) time
- Arbitrary tree
- find a node with a specific property
- E.g. node is a position in game tree, find win
- E.g. node is particular tour, find node(tour)
with least cost
23Separate and Conquer
- Finding the kth smallest (Case Analysis)
- Where can it be?
i nodes
N-i-1nodes
If at root, left subtree has k-1 nodes. If (ithen search for k-I-1 in right subtree If (ik)
then search for kth in right subtree. Complexity
depth of tree (log (n))
24Analysis Definitions
- Problem what is average time to find or insert
an element - Definitions follow from problem
- Internal path length of Binary tree (IPL)
- sum of depth of nodes ipl
- average cost of successful search average
depth1 cost number of nodes you look at - External path length of Binary tree (EPL)
- sum of cost of accessing all N1 null references
epl - average cost of insertion or failed search
epl/(N1)
25Example of IPL and EXP
0
1
1
2
2
Null reference
IPL 1122 6
EPL 223333 16 IPL25 IPL2N
What happens if you remove a leaf?
26Picture Proofof IPL related to IPL of subtrees
N node tree
I node subtree
N-I-1 node subtree
Each node (n-1 of them) had its path length
reduce by 1
27Some Theorems
- Average internal path length of binary search
tree is 1.38NlogN - Proof that it is O(nlog n)
- Let D(N) average ipl for tree with N nodes
- D(0)D(1) 0.
- D(i) average over all splits of tree (draw
picture) - D(i) (left split) 1/N (D(0).D(N-1)) N-1
- (right split) 1/N(..)
- same as quicksort analysis (to be
done) - O(NlogN)
- Why does EPL IPL2N (induction)
28Analysis Goal f(n) in terms of f(n-1)then expand
- 2/n( D(0)D(n-1)) n D(n)
- 2(D(0) D(n-1)) n2 nD(n)
- Goal compare with previous, subtract and hope
- 2(D(0)D(n-2)) (n-1)2 (n-1)D(n-1)
- 2D(n-1) 2n-1 nD(n) - (n-1)D(n-1)
- nD(n) (n1)D(n-1) 2n
- D(n)/(n1) D(n-1)/n 2/(n1) EUREKA!
Expand. - Hence D(n)/(n1) 2/(n1) 2/n .2/1
-
2(harmonic series) is O(log n) - Conclusion D(n) is O(nlog(n))
291/11/21/n is O(log n)
- General Trick sum approximates integral and vice
versa - Area under function 1/x is given by log(x).
4
2
1
3
30Balanced Trees
- Depth of tree controls amount of work for many
operations, so. - Goal keep depth small
- what does that mean?
- What can be achieved?
- What needs to be achieved?
- AVL 1962 - very balanced
- Btrees 1972 (reduce disk accesses)
- Red-Black 1978
- AA 1993, a little faster now
- Splay trees probabilistically balanced (on
finds) - All use rotations
31AVL Tree
- Recall height of empty tree -1
- In AVL tree, For all nodes, height of left and
right subtrees differ by at most 1. - AVL trees have logarithmic height
- Fibonacci numbers F11 F2 1 F32
F43 - Induction Strikes Thm Sh Fh3-1
- Let Si size of smallest AVL tree of height i
- S0 1 S12 why?
- So S1 F4-1
- ShSh-1Sh-21 Fh2-1Fh1-11
- Fh3-1.
- Hence number of nodes grows exponential with
height.
32On Insertion, what can go wrong?
- Tree balanced before insertion
2
1
0
1
1
1
H-1
H
33Insertion
- After insertion, there are 4 ways tree can be
unbalanced. Check it out. - Outside unbalanced handled by single rotations
- Inside unbalanced handled by double rotations.
2
2
1
1
c
r
p
b
a
q
34Maintaining Balance
- Rebalancing single and double rotations
- Left rotation after insertion
1
2
2
1
c
a
b
b
c
a
35Another View
1
2
2
a
1
c
Left
b
c
a
b
1
2
Right
a
2
1
c
b
a
c
b
Notice what happens to heights
36Another View
1
2
2
a
1
c
Left
b
c
a
b
1
2
Right
a
2
1
c
b
a
c
b
Notice what happens to heights, (LEFT) in
general a goes up 1, b stays the same, c goes
down 1
37Single (left) rotation
- Switches parent and child
- In diagram static node leftRotate(node 2)
- 1 2.left
- 2.left 1.right
- 1.right 2
- return 1
- Appropriate test question
- do it, i.e. given sequence of such as 6, 2, 7,1,
-1 etc show the succession on trees after
inserts, rotations. - Similar for right rotation
38Double Rotation (left)
3
1
Out of balance split
2
3
3
1
1
2
39In Steps
3
3
d
2
d
1
c
1
2
a
a
b
c
b
2
3
1
c
d
b
a
40Double Rotation Code (left-right)
- Idea rotate left child with its right child
- Then node with new left child
- static BinaryNode doubleLeft( BinaryNode n)
- n.left rotateRight(n.left)
- return rotateLeft(n)
- Analogous code for other middle case
- All rotations are O(1) operations
- Out-of-balance checked after insertion and after
deletions. All O(1). - For AVL, d is O(logN) so all operations O(logN).
41Red-Black Trees
- Every node red or black
- Root is black
- If node red, children black
- Every path from node to null has same number of
black nodes - Implementation used in Swing library (JDK1.2) for
search trees. - Single top-down pass means faster than AVL
- Depth typically same as for AVL trees.
- Code has many cases - skipping
- Red-black trees are what you get via TreeSet()
- And you can set/change the comparator
42AA Trees
- Simpler variant of Red-black trees
- simpler more efficient
- Add two more properties
- 5. Left children may not be red.
- 6. Remove colors, use levels
- Leaves are at level 1
- If red, level is level of parent
- If black, level is level of parent-1
- Code also has many special cases
43B-tree of order M
- Goal reduce the number of disk accesses
- Generalization of binary trees
- Method keep top of tree in memory and have large
branching factor - Disk access 1000 times slower than memory access
- M-ary tree yields O ( log (m/2 N)) accesses
- Data stored only at leaves
- Nonleaves store up to M-1 keys
- Root is leaf or has 2M children
- All internal nodes have (M1)/2M children
- All leaves at same depth and have (L1)/2L
children - Often set L M
- Practical algorithm, but code longish (many cases)
44B-Tree Picture internal node
Key
Ptrs
...
Goal Store as many keys a possible Keys are in
order M-1 Keys M ptrs Space MptrSize
(M-1)KeySize
45Representation
- Leaf nodes are arrays of size M (or linked lists)
- Internal nodes are
- array of size M-1 of keys
- array of size M of pointers to nodes
- The keys are in orders
- Choice of M depends on machine architecture and
problem. - M is argmax of
- keySize(M-1) ptrSizeM
46Example Analysis (all on disk)
- Suppose a disk block holds 8,192 bytes.
- Suppose each key is 32 bytes, each branch is 4
bytes, and each data record is 256 bytes. - L 32 (8192/256)
- If B-tree has order M, then M-1 keys.
- An interior node holds 32M-32 M4 36M-32
bytes. - Largest solution for M is 228.
47Splay Trees
- Like Splay lists, only probabilistically ordered
- Goal minimize access time
- Method no ordering on insert
- Ordering on finds only ( as in splay lists)
- Rotating inserted node up, moves node to root
but makes tree unbalanced - Instead use double rotations zig-zag and zig-zig
- This rebalances tree
- Guarantees O(M log N) costs for M operations, ie.
Amortized O(log N).
48Summary
- Depth of tree determines overall costs
- Balancing achieved by rotations
- AVL trees require 2 passes for insertion/deletions
- a pass down to find the point
- a pass up to do the corrections
- Red-Black and AA trees require 1 pass
- B-Trees are uses for accessing information that
wont fit in memory - General CASE ANALYSIS, separate and conquer