Title: Adaptive%20Sorting
1Adaptive Sorting
- A Dynamically Tuned Sorting Library
- Optimizing Sorting with Genetic Algorithms
- By Xiaoming Li, Maria Jesus Garzaran, and David
Padua - Presented by Anton Morozov
2Motivations and Observations
- Success of ATLAS, FFTW and SPIRAL (signal
processing libraries)
What Can be done for Sorting?
3Why are we interested in the sorting algorithms?
Does this reflects the performance of the sorting
algorithms?
4Which additional factors influence the
performance of the sorting algorithm?
5Performance vs. Standard Deviation
6Observation
Quicksort and Merge sort are both comparison
based sorts, thus they are independent of the
chosen distribution or standard deviation
Performance depends on degree of sortedness i.e.
the number of inversions Max n(n-1)/2
7(No Transcript)
8Architectural Model and Empirical Search
- We saw how programs like BLAS and ATLAS use
search to establish the parameters of the
underlying architecture
9So what Sort Algorithm is better?
- What performance of the sorting algorithm
depends on? - How to choose the best sorting algorithm?
10Sorting algorithms
- QuickSort
- Radix Sort
- Merge Sort
- Insertion Sort
- Sorting Networks
- Heap Sort
11Sorting algorithms
- QuickSort
- Radix Sort
- Merge Sort
- Insertion Sort
- Sorting Networks
- Heap Sort
12Sorting algorithms
- QuickSort
- Radix Sort
- Cache-Conscious Radix sort
- Merge Sort
- Multiway Merge Sort
- Insertion Sort
- Sorting Networks
- Heap Sort
Register sorts
13Quick Sort
- Description Pick a pivot, move records around
the pivot, records which are smaller than pivot
go to the front, bigger go to the back, and pivot
inserted between them. - Improvements
- Move iteratively
- Choose pivot among the first, middle and last
keys - Use fast sorts for the small partitioning.
(insertion or sorting networks)
14Cache-Conscious Radix Sort
Having b-bit integer and a radix of size 2r,
algorithm first sorts by lower r bits then sorts
by next r bits total in b/r phases, where r is
chosen to be r log2STBL-1 where STBL number of
entries in translation look-aside buffer.
- Improvements
- Proceed iteratively,
- Compute the histogram of the each r bits first
time the sort is applied, - Choose r as described above
15Multiway merge sort.
It partitions the keys into p subsets, each
subset is then sorted in (in this case with
CC-radix sort) and then subsets are merged using
heap. First smallest/largest element of the
subset is promoted to the leaves of the heap then
leaves are compared and an appropriate leaf is
promoted.
- Heap contains 2p-1 leaves.
- Each parent in a heap has A/r children, A cache
line, r size of a node.
16Insertion Sort.
Used for the small data sizes
Algorithm working from left to right for each key
scans to the left of the key and places it in the
appropriate place
Sorting Networks
Algorithms compares two inputs in sequence and if
one is bigger then the other it swaps them.
17Input Data Factors
- Number of keys
- Distribution
- Standard deviation
-
Approximate S.D. with Entropy vector
?i -Pilog2Pi where Pi ci /N, ci is a number of
keys with value i in that digit
18Parameters to search for during installation
Merge Sort Size of the heap and the
fanout depends on cache size, cache line, input
size and entropy at run time needs N and E
Quick Sort Insertion sort or Sorting Networks
and their thresholds, depends on the number of
registers and cache size
CC-radix Sort Insertion sort or Sorting Networks
or standard Radix sort depending on the size,
also depends on the number of registers and
cache size
19Learning procedure
? (N,E) ? CC-radix, Multiway Merge(N,E),
Quicksort
Winnow algorithm ?i wi Ei gt T
Computes weights vector and threshold depending
on the Entropy vector
20Selection at run time
- Sample the input array (every fourth entry)
- Compute the entropy vector
- Compute S ?i wi entropyi
- If S ?
- choose CC-radix
- else
- choose others based on size of input
- (either Merge Sort or QuickSort)
21Summarize
- Architectural Factors
- Cache / TLB size
- Number of Registers
- Cache Line Size
Empirical Search
- Runtime Factors
- Distribution shape of the data
- Amount of data to Sort
- Distribution Width
Any, since it doesnt matter
Learn at installation time
22Performance Results
23Performance Results
24Is it possible to do better?
25Sorting Primitives
To build a new sorting algorithms sorting and
selection primitives
- Sorting primitive Is a pure sorting algorithm
looked before - Selection primitive Is a process to be executed
at run time to decide which sorting algorithm to
apply
26Sorting Primitives
- Divide-by-Value corresponds to the first phase
of Quicksort takes the number of pivots as a
parameter (np1) - - A step in Quicksort
- Select one or multiple pivots and sort the input
array around these pivots - Divide-by-Position corresponds to initial break
of Merg Sort - takes size of each partition and fan-out of the
heap - - Divide input into same-size sub-partitions
- - Use heap to merge the multiple sorted
sub-partitions
27Sorting Primitives
- Divide-by-Radix corresponds to the step in the
radix sort algorithm. Takes a radix as a
parameter. - Parameter radix (r bits)
- Step 1 Scan the input to get distribution array,
which records how many elements in each of the 2r
sub-partitions. - Step 2 Compute the accumulative distribution
array, which is used as the indexes when copying
the input to the destination array. - Step 3 Copy the input to the 2r sub-partitions.
counter
accum.
dest.
src.
0 1 2 3
0 1 2 3
1 1 1 1
0 1 2 3
11 23 30 12
30 11 12 23
1 2 3 4
28Sorting Primitives
- Divide-by-radix-assuming-Uniform-distribution
same as above. Assumes that each bucket contains
n/2r keys - - Step 1 and Step 2 in DR are expensive.
- - If the input elements are distributed among 2r
sub-partitions near evenly, the input can be
copied into the destination array directly
assuming every partition have the same number of
elements. - - Overhead partition overflow
29Sorting Primitives
- Once the partition is small
- Leaf-Divide-by-Value same as DV but applies
recursively to the partitions. lt Threshold
applies register sorting - Leaf-Divide-by-Radix same as DR but is used on
all remaining subsets. lt threshold applies
register sorting
30Selection Primitives
- Branch-by-Size used to select different paths
based on size - Branch-by-Entropy uses entropy to branch on
different path. - Uses Winnow for learning the weight vector
31Genetic Algorithm
- Crossover
- Propagate good sub-trees
- Mutation
- Mutate the structure of the algorithm.
- Change the parameter values of primitives.
32Genetic Algorithm
- Fitness function
- Average performance by S.D.
- Uses Rank instead of fitness.
33Performance Results
34Performance Results
35Is it possible to do better?
Empirically was observed that Branch-by-Entropy
selection primitive was never used
36Classifier Sorting
Based on the idea that the performance of the
algorithm in one region of input space can be
independent of the other.
i is an input characteristic string, c is a
condition string with 1, 0 and for dont
care.
37- Example
- Encode number of keys into 4 bits.
- 0000 01M, 0001 12M
- Number of keys 10.5M. Encoded into 1100
Condition Action Fitness Accuracy
(dr 5 (lq 1 16))
(dp 4 2 ( lr 5 16))
1100
01
1100
1010
1100
110
(dv 2 ( lr 6 16))
38Experimental Results
39Experimental Results
40Experimental Results
41Summary and Future work
- The work presented shows how sorting can be
adapted to underlying platforms - Potential future work
- Figure out what went wrong or not wrong with
those graphs - Incorporate the notion of sortedness into sort
selection - Simplify the selection algorithm
- See if these notions can be used in the cache
oblivious way