Title: Introduction to Data Structure
1Introduction to Data Structure
7.1 Searching and List Verification 7.2
Definitions 7.3 Insertion Sort 7.4 Quick
Sort 7.5 Optimal Sorting Time 7.6 Merge
Sort 7.7 Heap Sort 7.8 Radix Sort
2Contents
- Chapter 1 Basic Concepts
- Chapter 2 Arrays
- Chapter 3 Stacks and Queues
- Chapter 4 Linked Lists
- Chapter 5 Trees
- Chapter 6 Graph
- Chapter 7 Sorting
- Chapter 8 Hashing
- Chapter 9 Heap Structures
- Chapter 10 Search Structures
37.1 Searching and List Verification
- Motivation of Sorting
- The term list here is a collection of records.
- Each record has one or more fields.
- Each record has a key to distinguish one record
with another. - For example, the phone directory is a list. Name,
phone number, and even address can be the key,
depending on the application or need.
Key (Student_ID) Field (Dept.)
4Sequential Searching
- Two ways to store a collection of records
- Sequential
- Non-sequential
- Assume a sequential list f. To retrieve a record
with key fi.key from such a list, we can do
search in the following order - fn.key, fn-1.key, , f1.key gt sequential
search
5Sequential Searching (cont.)
Program 7.1, p. 321
- int SeqSearch (int list, int searchnum, int n)
-
- int i
- listn searchnum
- for (i 0 listi ! searchnum i)
- return (i lt n) ? i -1)
- The average number of comparisons for a
successful search is
6Binary Search
- Basic concept
- Compare searchnum and listmiddle.key
- searchnum lt listmiddle.key
- search list0 listmiddle-1
- searchnum listmiddle.key
- return TRUE
- searchnum gt listmiddle.key
- search listmiddle1 listn-1
- Program 7.2, p. 322
- Complexity
- A binary search only takes O(log n) time to
search a sequential list with n records.
7List Verification
- Definition
- Given two lists, verification if both are the
same. - list1 list2?
- Example Tax verification
- The IRS gets salary reports from employers.
- IRS also gets tax filing reports from employees
about their salary. - Need to verify the two numbers match for each
individual employee. - Two methods
- Method 1 Random verification
- Method 2 Ordered verficiation
8Random Verification Unsorted Lists
void verify1(element list1, element list2,
int n, int m) / Compare two unordered lists
list1 and list2 / int i, j int marked
MAX_SIZE for (i 0 i lt m i) markedi
FALSE for (i 0 ilt n i) if ((j
seqsearch(list2, m, list1i.key()) lt 0)
printf(d is not in list 2\n, list1i.key)
else / check each of the other
fields from list1i and list2j, and
print out any discrepancies / markedj
TRUE for (i 1 i lt m i) if
(!markedi) printf(d is not in list 1\n,
list2i.key)
Complexity O(mn) Why?
9Sorted Verifying Sorted Lists
void verify2(element list1, element list2,
int n, int m) int i, j sort(list1,
n) sort(list2, m) i j 0 while (i lt n
j lt m) if (list1i.key lt list2j.key)
printf(d is not in list 2\n, list1i.key)
i else if (list1i.key
list2j.key) / compare list1i and
list2j on each of the other fields and
report any discrepancies / i j
else printf(d is not in list
1\n, list2j.key) j for ( i lt
n i) printf(d is not in list 2\n,
list1i.key for ( j lt m i)
printf(d is not in list 2\n, list1i.key
Complexity O(maxn log n, m log m)
107.2 Definition
- Formal definition
- Given a list of records (R0, R1, , Rn-1), each
with a key Ki. - The sorting problem is to find permutation, s,
such that - Ks(i) Ks(i1) , 1 i n 1.
- The desired ordering is (Rs(1), Rs(2), Rs(n)).
- If a list has several key values that are
identical, the permutation, ss, is not unique. - Let ss be the permutation of the following
properties - (1) sorted Ks(i) Ks(i1) , 1 i n 1
- (2) stable If i lt j and Ki Kj in the input
list, then Ri precedes Rj in the sorted list. - The above sorting method that generates ss is
stable.
11Stable Sorting
- Example
- Stable permutation
- Unstable permutation
0 1 2 3 4 5
?
1 0 2 5 3 4
0 1 2 3 4 5
?
1 0 3 5 2 4
12Category of Sorting Methods
- Internal method
- Methods to be used when the list to be sorted is
small enough so that the entire sort list can be
carried out in the main memory. - Example
- Insertion sort
- Quick sort
- Merge sort
- Heap sort
- Radix sort
- External method
- Methods to be used on larger lists
137.3 Insertion Sort
insert
1 3 5 7 9
4
1 3 4 5 7 9
14Insertion Sort Program
- Program 7.5, p 327
- void insertion_sort(element list, int n)
-
- int i, j
- element next
- for (i 1 i lt n i)
- next listi
- for (j i 1 j gt 0 next.key lt
listj.key j--) - listj1 listj
- listj1 next
-
-
15Insertion Sort Example
- Record Ri is left out of order (LOO) iff Ri lt
- Example 1
- Assume n 5 and the input key sequence is 5, 4,
3, 2, 1
j 1 2 3 4 5
- 5 4 3 2 1
2 4 5 3 2 1
3 3 4 5 2 1
4 2 3 4 5 1
5 1 2 3 4 5
16Insertion Sort Example
- Example 2
- Assume n 5 and the input key sequence is 2, 3,
4, 5, 1
j 1 2 3 4 5
- 2 3 4 5 1
2 2 3 4 5 1
3 2 3 4 5 1
4 2 3 4 5 1
5 1 2 3 4 5
O(1)
O(1)
O(1)
O(n)
17Insertion Sort
- Analysis
- If there are k LOO records in a list, the
computing time for sorting the list via insertion
sort is - O((k1)n) O(kn)
- Therefore, if k ltlt n, then insertion sort might
be a good sorting choice.
187.4 Quick Sort
- Quick sort is developed by C. A. R. Hoare.
- Quick sort has the best average behavior among
the sorting methods. - Basic concept
- Based on the divide and conquer paradigm
- Choose an element (pivot) p, i.e., p K0
- Place p into a proper position j such that
- K0 Kj-1 ? p
- Kj1 Kn-1 gt p
19Quick Sort (cont.)
- Example
- 25 57 48 37 12 92 86 33
- (12) 25 (57 48 37 92 86 33)
- 12 25 (48 37 33) 57 (92 86)
- 12 25 (37 33) 48 57 (92 86)
- 12 25 33 37 48 57 (92 86)
- 12 25 33 37 48 57 86 92
20Quick Sort (cont.)
- Key mechanism a partition method
- Algorithm
- quicksort(list, left, right)
-
- if (left lt right)
- partition(list, left, right, j)
- quicksort(list, left, j-1)
- quicksort(list, j1, right)
-
-
21Quick Sort Partition
- Concept of partition method
- Let pivot listleft be the pivot
- Use two pointers, i and j
- i ? until listi ? pivot
- j ? until listj ? pivot
- listi ? listj if i lt j
22Quick Sort Partition (cont.)
- Example of partition method
- 25 57 48 37 12 92 86 33
- 25 12 48 37 57 92 86 33
- (12) 25 (48 37 57 92 86 33)
23Quick Sort Program Codes
- void quicksort(element list, int left, int
right) -
- int pivot, i, j
- element temp
- if (left lt right)
- i left j right 1 pivot
listleft.key - do
- do i while (listi.key lt pivot)
- do j-- while (listj.key gt pivot)
- if (i lt j) SWAP(listi, listj, temp)
- while (i lt j)
- SWAP(listleft, listj, temp)
- quicksort(list, left, j1)
- quicksort(list, j1, right)
-
partition
24Quick Sort Example
- Example
- Input list 10 records with keys (26, 5, 37, 1,
61, 11, 59, 15, 48, 19).
K0 K1 K2 K3 K4 K5 K6 K7 K8 K9 Left Right
26 5 37 1 61 11 59 15 48 19 1 10
11 5 19 1 15 26 59 61 48 37 1 5
1 5 11 19 15 26 59 61 48 37 1 2
1 5 11 19 15 26 59 61 48 37 4 5
1 5 11 15 19 26 59 61 48 37 7 10
1 5 11 15 19 26 48 37 59 61 7 8
1 5 11 15 19 26 37 48 59 61 10 10
1 5 11 15 19 26 37 48 59 61
25Quick Sort Analysis
- Analysis of QuickSort
- Worse case O(n2)
- Average case
- Assume each time a record is correctly positioned
- left sublist right sublist
- Let T(n) be the time taken to sort a list of size
n - T(n) cn 2T(n/2), for some constant c
- cn 2(cn/2 2T(n/4))
- 2cn 4T(n/4)
-
-
- cn log2n T(1) O(n logn)
26Quick Sort Variant
- Quick sort using a median of three
- Pick the median of the first, middle, and last
keys in the current sublist as the pivot. - Thus,
- pivot medianKleft, K(leftright)/2, Kright.
277.5 Optimal Sorting Time
- Question
- How quickly can we sort a list of n objects?
- Answer
- If only operations permitted on keys are
comparisons and interchanges, then O(n logn) is
the best possible time. - Method
- This is done by using a tree called decision tree
that describes the sorting process. - Each vertex of the tree represents a key
comparison, and the branches indicate the result.
28Decision Tree for Insertion Sort
0, 1, 2
K1 K2
No
Yes
0, 1, 2
1, 0, 2
K2 K3
K1 K3
No
No
Yes
Yes
0, 1, 2
1, 0, 2
0, 2, 1
stop
K2 K2
1 , 2, 0
stop
K1 K3
No
Yes
No
IV
Yes
I
2, 0, 1
1 , 2, 0
2, 1 , 0
0, 2, 1
stop
stop
stop
stop
V
VI
II
III
29Decision Tree (cont.)
- Theorem 7.1 Any decision tree that sorts n
distinct elements has a height of at least
log2(n!) 1 - Corollary Any algorithm that sorts only by
comparisons must have a worst-case computing time
of O(n log n)
307.6 Merge Sort
- Kernel operation of Merge Sort Merging
- Given two sorted list, merge them into a single
sorted list - Example
- 25 37 48 57
- 12 33 86 92
- gt 12 25 33 37 48 57 86
92 - Methods for merging
- Simple merge
- O(1) space merge
31Simple Merge
void merge(element list, element sorted, int
i, int m, int n) / merge listi,,listm, and
listm1,,listn / int j, k, t j
m1 k i while (i lt m j lt n)
if (listi.key lt listj.key) sortedk
listi else sortedk
listj if (i gt m) for (t j t
lt n t) sortedkt-j listt else
for (t i t lt m t) sortedkt-i
listt
Time space complexity O(n - i 1)
32O(1) Space Merge
- A merge algorithm only requires O(1) additional
space - Assumption
- The total number of records n is a perfect square
- The numbers of records in the left sublist and
the right sublist are multiple of
33O(1) Space Merge
? Algorithm Step 1 Identify the records
with largest keys. This is done by following
right to left along the two lists to be
merged. Step 2 Exchange records of the second
list identified in Step 1 with those just to the
left of those identified from the first
list. Step 3 Swap the block of largest
with the leftmost block (unless it is already the
leftmost block). Sort the rightmost block. Step
4 Reorder the blocks, excluding the block of
largest records, into nondecreasing order of the
last key in the blocks. Step 5 Perform as many
merge substeps as needed to merge the
blocks, other than the block with the largest
keys. Step 6 Sort the block with the largest
keys.
34O(1) Space Merge Example
0 2 4 6 8 a c e g i j k l m n t w z1 3 5 7 9 b d
f h o p q r s u v x y 0 2 4 6 8 a c e g i j k l
m n t w z 1 3 5 7 9 b d f h o p q r s u v x y 0
2 4 6 8 ac e g i j ku v x y w z1 3 5 7 9 bd f
h o p qr s l m n t u v x y w zc e g i j k0 2
4 6 8 a1 3 5 7 9 bd f h o p ql m n r s t u v
x y w z 0 2 4 6 8 a1 3 5 7 9 bc e g i j kd f h
o p ql m n r s t 0 v x y w z u 2 4 6 8 a1 3 5
7 9 bc e g i j kd f h o p ql m n r s t 0 1 x
y w z u 2 4 6 8 av 3 5 7 9 bc e g i j kd f h o
p ql m n r s t 0 1 2 y w z u x 4 6 8 av 3 5 7
9 bc e g i j kd f h o p ql m n r s t
35O(1) Space Merge Example (cont.)
0 1 2 3 4 5 u x w 6 8 av y z 7 9 bc e g i j kd
f h o p ql m n r s t 0 1 2 3 4 5 6 7 8 u w av
y z x 9 bc e g i j kd f h o p ql m n r s t 0
1 2 3 4 5 6 7 8 9 a wv y z x u bc e g i j kd f
h o p ql m n r s t 0 1 2 3 4 5 6 7 8 9 a w v y
z x u b c e g i j kd f h o p ql m n r s t 0 1
2 3 4 5 6 7 8 9 a b c d e f g h i j k v z uy x w
o p ql m n r s t 0 1 2 3 4 5 6 7 8 9 a b c d e
f g h i j k v z u y x w o p ql m n r s t 0 1 2
3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q y
x wv z u r s t 0 1 2 3 4 5 6 7 8 9 a b c d e f
g h i j k l m n o p q r s tv z u y x w
36O(1) Space Merge Analysis
- Steps 1 and 2
- O( ) time and O(1) space
- Step 3
- Swapping O( ) time and O(1) space
- Sorting O(n) time and O(1) space (via insertion
sort) - Step 4
- O(n) time and O(1) space (via selection sort)
- Selection sort sorts m records using O(m2) key
comparisons and O(m) record moves. - O(n1.5) time and O(1) space (via insertion sort)
- Insertion sort needs O(m2) record moves (
records per block n record moves).
37O(1) Space Merge Analysis (cont.)
- Step 5
- Merge substeps The total number of is at most
. The total time for is O(n). - Step 6
- The sort of can be done in O(n) by using either a
selection sort or an insertion sort. - In total
- O(n) time and O(1) space
38Iterative Merge Sort
- Concept
- Treat the input as n sorted lists, each of length
1. - Lists are merged by pairs to obtain n/2 lists,
each of size 2 (if n is odd, the one list is of
length 1). - The n/2 lists are then merged by pairs, and so on
until we are left with only one list.
39Iterative Merge Sort Example
26
5
77
1
61
11
59
15
48
19
5 26
1 77
11 61
15 59
19 48
1 5 26 77
11 15 59 61
19 48
1 5 11 15 26
59 61 77
19 48
1 5 11 15 19
26 48 59 61 77
40Iterative Merge Sort Analysis
- Program code
- Program 7.9 and 7.10
- Time complexity
- Total of passes are made over the
data - Each pass of merge sort takes O(n) time
- The total of computing time is O(n log n)
41Recursive Merge Sort
- Concept
- Divide the list to be sorted into two roughly
equal parts - left sublist left (leftright)/2
- right sublist (leftright)/2 1 right
- Sort each sublist recursively, and merge the
sorted sublists - To avoid copying, the use of a linked list
(integer instead of real link) for sublist is
desirable. - Program code
- Program 7.11 and 7.12, complexity O(n log n)
42Recursive Merge Sort Example
26
5
77
1
61
11
59
15
48
19
5 26
11 59
19 48
5 26 77
11 15 59
19 48
1 61
1 5 26 61 77
11 15 19 48 59
1 5 11 15 19
26 48 59 61 77
43Natural Merge Sort
- Concept
- It takes advantage of the prevailing order within
the list before performing merge sort - It runs an initial pass over the data to
determine the sublists of records that are in
order - Then it uses the sublists for the merge sort
44Natural Merge Sort Example
26
5 77
1 61
11 59
15 48
19
5 26 77
1 11 59 61
15 19 48
1 5 11 26 59
61 77
15 19 48
1 5 11 15 19
26 48 59 61 77
457.7 Heap Sort
- Preliminary
- Merge sort needs O(n) additional storage space,
even though its computing time is O(n log n) - Merge sort using O(1) merge only needs O(1) space
but the sorting algorithm is much slower - Heap sort
- only requires a fixed amount of additional
storage - achieves worst-case and average computing time
O(n log n)
46Heap Sort (cont.)
- Concept
- Adopt the max-heap structure
- Consists of two phases
- Phase 1 create the heap
- Insert the n records into an empty heap
- Phase 2 adjust the heap
- Exchange the max element with the current last
element and perform adjustment
47Heap Sort Program Code
void heapsort (element list, int n) int i,
j element temp for (i n/2 i gt 0 i--)
/ Phase 1 / adjust(list, i, n)
for (i n-1 i gt 0 i--) / Phase 2 /
SWAP(list1, listi1, temp)
adjust(list, 1, i)
Complexity O(n log n)
48Heap Sort Program Code (cont.)
void adjust (element list, int root, int n)
int child, rootkey element temp
listroot rootkey listroot.key child
2root while (child lt n) if (child
lt n) (listchild.key lt
listchild1.key)) child if
(rootkey gt listchild.key) break else
listchild/2 listchild
child 2 listchild/2
temp
49Heap Sort Example
26
77
1
1
5
77
2
61
59
2
3
3
1
61
11
59
48
19
11
26
4
4
5
5
6
7
7
6
15
48
19
15
1
5
8
9
10
8
10
9
(b) Initial heap
(a) Input array
50Heap Sort Example (cont.)
61
59
1
1
48
59
48
26
2
2
3
3
15
19
11
26
15
19
11
1
4
5
5
7
7
6
6
5
1
5
8
8
9
Heap size 8, Sorted 61, 77
Heap size 9, Sorted 77
51Heap Sort Example (cont.)
61
59
1
1
48
59
48
26
2
2
3
3
15
19
11
26
15
19
11
1
4
5
5
7
7
6
6
5
1
5
8
8
9
Heap size 8, Sorted 61, 77
Heap size 9, Sorted 77
527.8 Radix Sort
Most significant
- Sorting on multiple keys
- A list of records are said to be sorted with
respect to the keys K0, K2, , Kr-1 iff - for every pair of records i and j, i lt j and
- (K0i, K1i, , Kr-1i) (K0j, K1j, , Kr-1j).
- (x0, x2, , xr-1) (y0, y2, , yr-1) iff
- either xi yi, 0 i j, and xj1 lt yj1 for
some j lt r-1 - or xi yi , 0 i lt r
- Example, sorting a deck of cards suite and face
value. - K0 suit ? lt ? lt ? lt ?
- K1 Face value 2 lt 3 lt 4 lt 10 lt J lt Q lt K lt A
- A possible ordering
- 2?, A?, 2?, , A?, 2?, , A?, 2?, , A?
53Radix Sort (cont.)
- Two popular ways to sort on multiple keys
- MSD sort on the most significant key into
multiple piles - LSD sort on the least significant digit first
- LSD and MSD only defines the order in which the
keys are to be sorted - LSD and MSD can be used even when there is only
one key - E.g., if the keys are numeric, then each decimal
digit may be regarded as a subkey - gt Radix sort
54Radix Sort Example
55Radix Sort Example (cont.)
56Radix Sort Example (cont.)
57Radix Sort Analysis
- Let
- d the number of digits
- r radix size
- n number of records
- Time complexity
- O(d(nr))
- Usually r ltlt n, so O(dn)