Adaptive%20Sorting - PowerPoint PPT Presentation

About This Presentation

Title:

Adaptive%20Sorting

Description:

By Xiaoming Li, Maria Jesus Garzaran, and David ... Winnow algorithm: i wi *Ei T ... Uses Winnow for learning the weight vector. Genetic Algorithm. Crossover: ... – PowerPoint PPT presentation

Number of Views:150

Avg rating:3.0/5.0

Slides: 41

Provided by: amo58

Learn more at: https://www.cs.cornell.edu

Category:

more less

Transcript and Presenter's Notes

Title: Adaptive%20Sorting

1
Adaptive Sorting

A Dynamically Tuned Sorting Library
Optimizing Sorting with Genetic Algorithms
By Xiaoming Li, Maria Jesus Garzaran, and David
Padua
Presented by Anton Morozov

2
Motivations and Observations

Success of ATLAS, FFTW and SPIRAL (signal
processing libraries)

What Can be done for Sorting?
3
Why are we interested in the sorting algorithms?
Does this reflects the performance of the sorting
algorithms?
4
Which additional factors influence the
performance of the sorting algorithm?
5
Performance vs. Standard Deviation
6
Observation
Quicksort and Merge sort are both comparison
based sorts, thus they are independent of the
chosen distribution or standard deviation
Performance depends on degree of sortedness i.e.
the number of inversions Max n(n-1)/2
7
(No Transcript)
8
Architectural Model and Empirical Search

We saw how programs like BLAS and ATLAS use
search to establish the parameters of the
underlying architecture

9
So what Sort Algorithm is better?

What performance of the sorting algorithm
depends on?
How to choose the best sorting algorithm?

10
Sorting algorithms

QuickSort
Radix Sort
Merge Sort
Insertion Sort
Sorting Networks
Heap Sort

11
Sorting algorithms

QuickSort
Radix Sort
Merge Sort
Insertion Sort
Sorting Networks
Heap Sort

12
Sorting algorithms

QuickSort
Radix Sort
Cache-Conscious Radix sort
Merge Sort
Multiway Merge Sort
Insertion Sort
Sorting Networks
Heap Sort

Description Pick a pivot, move records around
the pivot, records which are smaller than pivot
go to the front, bigger go to the back, and pivot
inserted between them.
Improvements
Move iteratively
Choose pivot among the first, middle and last
keys
Use fast sorts for the small partitioning.
(insertion or sorting networks)

14
Cache-Conscious Radix Sort
Having b-bit integer and a radix of size 2r,
algorithm first sorts by lower r bits then sorts
by next r bits total in b/r phases, where r is
chosen to be r log2STBL-1 where STBL number of
entries in translation look-aside buffer.

Improvements
Proceed iteratively,
Compute the histogram of the each r bits first
time the sort is applied,
Choose r as described above

15
Multiway merge sort.
It partitions the keys into p subsets, each
subset is then sorted in (in this case with
CC-radix sort) and then subsets are merged using
heap. First smallest/largest element of the
subset is promoted to the leaves of the heap then
leaves are compared and an appropriate leaf is
promoted.

Heap contains 2p-1 leaves.
Each parent in a heap has A/r children, A cache
line, r size of a node.

16
Insertion Sort.
Used for the small data sizes
Algorithm working from left to right for each key
scans to the left of the key and places it in the
appropriate place
Sorting Networks
Algorithms compares two inputs in sequence and if
one is bigger then the other it swaps them.
17
Input Data Factors

Number of keys
Distribution
Standard deviation

Approximate S.D. with Entropy vector
?i -Pilog2Pi where Pi ci /N, ci is a number of
keys with value i in that digit
18
Parameters to search for during installation
Merge Sort Size of the heap and the
fanout depends on cache size, cache line, input
size and entropy at run time needs N and E
Quick Sort Insertion sort or Sorting Networks
and their thresholds, depends on the number of
registers and cache size
CC-radix Sort Insertion sort or Sorting Networks
or standard Radix sort depending on the size,
also depends on the number of registers and
cache size
19
Learning procedure
? (N,E) ? CC-radix, Multiway Merge(N,E),
Quicksort
Winnow algorithm ?i wi Ei gt T
Computes weights vector and threshold depending
on the Entropy vector
20
Selection at run time

Sample the input array (every fourth entry)
Compute the entropy vector
Compute S ?i wi entropyi
If S ?
choose CC-radix
else
choose others based on size of input
(either Merge Sort or QuickSort)

21
Summarize

Architectural Factors
Cache / TLB size
Number of Registers
Cache Line Size

Empirical Search

Runtime Factors
Distribution shape of the data
Amount of data to Sort
Distribution Width

Any, since it doesnt matter
Learn at installation time
22
Performance Results
23
Performance Results
24
Is it possible to do better?
25
Sorting Primitives
To build a new sorting algorithms sorting and
selection primitives

Sorting primitive Is a pure sorting algorithm
looked before
Selection primitive Is a process to be executed
at run time to decide which sorting algorithm to
apply

26
Sorting Primitives

Divide-by-Value corresponds to the first phase
of Quicksort takes the number of pivots as a
parameter (np1)
- A step in Quicksort
Select one or multiple pivots and sort the input
array around these pivots
Divide-by-Position corresponds to initial break
of Merg Sort
takes size of each partition and fan-out of the
heap
- Divide input into same-size sub-partitions
- Use heap to merge the multiple sorted
sub-partitions

27
Sorting Primitives

Divide-by-Radix corresponds to the step in the
radix sort algorithm. Takes a radix as a
parameter.
Parameter radix (r bits)
Step 1 Scan the input to get distribution array,
which records how many elements in each of the 2r
sub-partitions.
Step 2 Compute the accumulative distribution
array, which is used as the indexes when copying
the input to the destination array.
Step 3 Copy the input to the 2r sub-partitions.

counter
accum.
dest.
src.
0 1 2 3
0 1 2 3
1 1 1 1
0 1 2 3
11 23 30 12
30 11 12 23
1 2 3 4
28
Sorting Primitives

Divide-by-radix-assuming-Uniform-distribution
same as above. Assumes that each bucket contains
n/2r keys
- Step 1 and Step 2 in DR are expensive.
- If the input elements are distributed among 2r
sub-partitions near evenly, the input can be
copied into the destination array directly
assuming every partition have the same number of
elements.
- Overhead partition overflow

29
Sorting Primitives

Once the partition is small
Leaf-Divide-by-Value same as DV but applies
recursively to the partitions. lt Threshold
applies register sorting
Leaf-Divide-by-Radix same as DR but is used on
all remaining subsets. lt threshold applies
register sorting

30
Selection Primitives

Branch-by-Size used to select different paths
based on size
Branch-by-Entropy uses entropy to branch on
different path.
Uses Winnow for learning the weight vector

31
Genetic Algorithm

Crossover
Propagate good sub-trees

Mutation
Mutate the structure of the algorithm.
Change the parameter values of primitives.

32
Genetic Algorithm

Fitness function
Average performance by S.D.
Uses Rank instead of fitness.

33
Performance Results
34
Performance Results
35
Is it possible to do better?
Empirically was observed that Branch-by-Entropy
selection primitive was never used
36
Classifier Sorting
Based on the idea that the performance of the
algorithm in one region of input space can be
independent of the other.
i is an input characteristic string, c is a
condition string with 1, 0 and for dont
care.
37

Example
Encode number of keys into 4 bits.
0000 01M, 0001 12M
Number of keys 10.5M. Encoded into 1100

Condition Action Fitness Accuracy
(dr 5 (lq 1 16))
(dp 4 2 ( lr 5 16))

1100
01
1100
1010
1100
110
(dv 2 ( lr 6 16))
38
Experimental Results
39
Experimental Results
40
Experimental Results
41
Summary and Future work