Title: Heap Sort
1Heap Sort Many the sorting algorithms (bubble
sort, insertion sort, even quicksort) can
take O(N2) time in the worst case. We examine an
algorithm that guarantees O( N log N ) sort time
in the worst case. A minimum binary heap is
used to sort an array of data into descending
order. A maximum binary heap is used to sort data
into ascending order.
2- Facts about binary heps
- A minimum binary heap is a
- complete binary tree
- each node is less than or equal to its child
nodes - A binary heap can be conveniently represented
using an array.
3Example
Recall that given the index, j, of a node in the
array, it is a simple matter to determine the
index of the left and right child nodes and of
the parent node left child 2 j 1 parent
( j 1 ) / 2 right child 2 j 2
4The heapsort algorithm consists of two phases-
build a heap from an arbitrary array- use the
heap to sort the data Building a heap from an
arbitrary arrayIt is easier to picture this
process if we represent the heap using a binary
tree rather than an array. Algorithm (bottom
up) let index be the index of the last parent
node in the treewhile index is greater than or
equal to zero perform a reheap-down
operation starting with the node at index
decrement indexend while
5Example Convert the following array to a heap
Picture the array as a complete binary tree
For an array with N elements, the index of the
last parent node in the tree is (N-1)/2
6(No Transcript)
7Having built the heap, we now sort the array
Note in this section we will represent the data
in both binary tree and array formats it is
important to understand that in practice the data
is stored only as an array.
Algorithm let swapIndex N - 1while swapIndex
is greater than 0 swap data at position
swapIndex with data at position 0 reheap down
between positions 0 and swapIndex 1end while
8(No Transcript)
9And so the process continues until the entire
array is sorted.
10Implementation templatelt typename type gtvoid
sort( type data, int size )//Pre the capacity
of the array pointed to by data//is at least
size//Post the first size elements of data have
been//sorted in descending order int
swpIndx buildHeap( data, size ) for(
swpIndx size 1 swpIndx gt 0 swpIndx-- )
swap( data 0 , data swpIndx )
reheapDown( data, 0, swpIndx )
11templatelt typename type gtvoid buildHeap( type
data, int size )//Pre data points to an array
of data of capacity at//least size
elements//Post the first size elements of data
are a heap int index for( index
(size2)/2 index gt 0 index-- )
reheapDown( data, index, size )
12templatelt typename type gtvoid reheapDown( type
data, int top, int size )// Pre data between
index top 1 and index size 1// is a heap//
Post data between index top and index size 1
is// a heap int leftChild 2 top 1
int rightChild 2 top 2 int
minChild if( leftChild lt size )
// find index of smallest child if(
rightChild gt size data
leftChild lt data rightChild )
minChild leftChild else
minChild rightChild
13 // if data at top is greater than
smallest // child then swap and continue
if( data top gt data minChild )
swap( data top , data minChild
) reheapDown( data, minChild, size
) Note this function is
tail-recursive and so can easily be replaced with
an iterative version having O(1) space
requirements. This would make an excellent exam
question!
14Time complexity of Heapsort We determine the time
complexity of the build heap step and the time
complexity of the subsequent sorting step. The
time complexity of the sorting operation is easy
to determine. For each element in the heap, we
perform a single swap and a reheapDown. If there
are N elements in the heap, the reheapDown
operation is O( log N ) and hence the sorting
operation is O( N log N ). Building the heap is
an O(N) operation! The reheapDown function is
called O(N) times but we have to realize that the
reheapDown operation does not always start at the
top of the heap and that it is not called at all
on any of the leaf nodes
15To determine the time complexity of the buildHeap
function we look at the total number of times the
comparison and swap operations occur while
building the heap. The worst case is when the
last level in the heap is full. The number of
edges on the path from each node to a leaf node
represents the maximum number of comparison and
swap operations that will occur while applying
the reheapDown operation to that node. By summing
the total length of these paths, we will
determine the time complexity of the buildHeap
function. For a particular node, the path
starting at the node has length equal to the
height of the node.
16There n/2 nodes of height 1, n/4 of height 2, n/8
of height 3, and (working up the tree) 1 node of
height log (n1). Summing this gives O(n)
17Hence in the worst case the overall time
complexity of the heapsort algorithm is max
O(N), O( N log N ) O( N log N )