Training - PowerPoint PPT Presentation

About This Presentation
Title:

Training

Description:

Data Structures and Algorithms Analysis of Algorithms Richard Newman – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 55
Provided by: Agam69
Learn more at: https://www.cise.ufl.edu
Category:

less

Transcript and Presenter's Notes

Title: Training


1
Data Structures and Algorithms Analysis of
Algorithms
Richard Newman
2
Players
  • Boss/Manager/Customer
  • Wants a cheap solution
  • Cheap efficient
  • Programmer/developer
  • Wants to solve the problem, deliver system
  • Theoretician
  • Wants to understand
  • Student
  • Might play any or all of these roles some day

3
Why Analyze Algorithms?
  • Predict performance
  • Compare algorithms
  • Provide guarantees
  • Understand theory
  • Practical reason avoid poor performance!
  • Also avoid logical/design errors

4
Algorithmic Success Stories
  • DFT
  • Discrete Fourier Transform
  • Take N samples of waveform
  • Decompose into periodic components
  • Used in DVD, JPEG, MPEG, MRI, astrophysics, ....
  • Brute force N2 steps
  • FFT algorithm N lg N steps

5
Algorithmic Success Stories
  • B-Body Simulation
  • Simulate gravitational interactions among N
    bodies
  • Brute force N2 steps
  • Barnes-Hut algorithm N lg N steps

6
The Challenge
  • Will my algorithm be able to solve problem with
    large practical input?
  • Time
  • Memory
  • Power
  • Knuth (1970's) use scientific method to
    understand performance

7
Scientific Method
  • Observe feature of natural world
  • Hypothesize a model consistent with observations
  • Predict events using hypothesis
  • Test predictions experimentally
  • Iterate until hypothesis and observations agree

8
Scientific Method Principles
  • Experiments must be reproducible
  • Hypotheses must be falsifiable

9
Example 3-Sum
  • Given N distinct integers, how many triples sum
    up to exactly zero

cat 8ints.txt 8 30 -40 -20 -10 40 0 10 5
./ThreeSum 8ints.txt 4
10
3-Sum Brute Force Algo
For i0 to N-1 For ji1 to N-1 For
kj1 to N-1 If ai aj ak
0 count return count
11
Measuring Running Time
  • Manually
  • Start stopwatch when starting program
  • Stop it when program finishes
  • Can do this in script (date)
  • Internally
  • Use C library function time()
  • Can insert calls around code of interest
  • Avoid initialization, etc.

12
Measuring Running Time
  • Strategy
  • Run program on various input sizes
  • Measure time for each
  • Can do this in script also
  • Plot results
  • tools
  • http//www.opensourcetesting.org/performance.php

13
Measuring Running Time
N Time (s.)
250 0.0
500 0.0
1000 0.1
2000 0.8
4000 6.4
8000 51.1
16000 ?
What do you think the time will be for input of
size 16,000? Why?
14
Data Analysis
  • Standard Plot
  • Plot running time T(N) vs. input size N
  • Use linear scales for both

15
Data Analysis
Log-log Plot If straight line Slope gives
power lg y m lg x b y 2b xm

16
Hypothesis, Prediction, Validation
N Time (s.)
250 0.0
500 0.0
1000 0.1
2000 0.8
4000 6.4
8000 51.1
16000 ?
Hypothesis running time 10-10 N3 Prediction
T(16,000) 409.6 s Observation T(16,000)
410.8
17
Doubling Hypothesis
  • Quick way to estimate slope m in log-log plot
  • Strategy Double size of input each run
  • Run program on doubled input sizes
  • Measure time for each
  • Take ratio of times
  • If polynomial, should converge to power

18
Doubling Hypothesis
N time ratio lg ratio
500 0.0 - -
1000 0.1 6.9 2.8
2000 0.8 7.7 2.9
4000 6.4 8.0 3.0
8000 51.1 8.0 3.0
16000 410.8 8.0 3.0
Hypothesis running time 10-10 N3 Prediction
T(16,000) 409.6 s Observation T(16,000)
410.8
19
Doubling Hypothesis
  • Hypothesis running time is about aNb
  • With b lg(ratio of running times)
  • Caveat!!!
  • Cannot identify logarithmic factors
  • How to find a?
  • Take large input, equate time to hypothesized
    time with b as estimated, then solve for a

20
Experimental Algorithmics
  • System Independent Effects
  • Algorithm
  • Input data

21
Experimental Algorithmics
  • System Dependent Effects
  • Hardware CPU, memory, cache, ...
  • Software compiler, interpreter, garbage
    collection, ...
  • System OS, network, other processes

22
Experimental Algorithmics
  • Bad news
  • Hard to get precise measurements
  • Good news
  • Easier than other physical sciences!
  • Can run huge number of experiments

23
Mathematical Running Time Models
  • Total running time sum (cost x freq)
  • Need to analyze program to determine set of
    operations over which weighted sum is computed
  • Cost depends on machine, compiler
  • Frequency depends on algorithm, input data

Donald Knuth 1974 Turing Award
24
How to Estimate Constants?
Operation example Time (ns)
Integer add a b 2.1
Integer multiply a b 2.4
Integer divide a / b 5.4
Fp add a b 4.6
Fp multiply a b 4.2
Fp divide a / b 13.5
sine Math.sine(theta) 91.3
arctangent Math.atan2(x,y) 129.0
... ... ...
Running OS X on Macbook Pro 2.2 GHz 2 GB RAM
25
Experimental Algorithmics
  • Observation most primitive functions take
    constant time
  • Warning non-primitive often do not!
  • How many instructions as f(input size)?

int count 0 for (int i 1 i lt N i)
if (ai 0) count
26
Experimental Algorithmics
int count 0 for (int i 1 i lt N i)
if (ai 0) count
Operation Frequency
Var declaration 2
assignment 2
lt compare N1
compare N
array access N
increment N to 2N
27
Counting Frequency - Loops
int count 0 for (int i 1 i lt N i) for
(int j i1 j lt N, j) if (ai aj
0) count
How many additions in loop? N-1 N-2 ... 3
2 1 (1/2) N (N-1) Exact number of other
operations? Tedious and difficult....
28
Experimental Algorithmics
  • Observation tedious at best
  • Still may have noise!
  • Approach Simplify!
  • Use some basic operation as proxy
  • e.g., array accesses

int count 0 for (int i 1 i lt N i) for
(int j i1 j lt N j) if (ai aj
0) count
29
Experimental Algorithmics
  • Observation lower order terms become less
    important as input size increases
  • Still may be important for small inputs
  • Approach Simplify! Use
  • Ignore lower order terms
  • N large, they are negligible
  • N small, who cares?

30
Leading Term Approximation
Examples Ex 1 1/6 N3 20 N 16 1/6
N3 Ex 2 1/6 N3 100 N4/3 56 1/6 N3 Ex
3 1/6 N3 1/2 N2 1/3 N 1/6 N3 Discard
lower order terms e.g., N1000, 166.67
million vs. 166.17 million
31
Leading Term Approximation
Technical definition f(N) g(N) means limit
1
f(N) N -gt inf g(N)
32
Bottom Line
int count 0 for (int i 1 i lt N i) for
(int j i1 j lt N, j) if (ai aj
0) count
How many array accesses in loop? N2 Use cost
model and notation!
33
Example - 3-Sum
int count 0 for (int i 1 i lt N i) for
(int j i1 j lt N j) for (int k j1 k
lt N k) if (ai aj ak 0)
count
How many array accesses in loop? Execute N
(N-1)(N-2)/3! Times (1/6)N3 (1/2) N3 array
accesses (3 per stmt) Use cost model and
notation!
34
Estimating Discrete Sums
Take Discrete Math (remember?) Telescope series,
inductive proof Approximate with
integral Doesn't always work! Use Maple or
Wolfram Alpha
35
Takeaway
In principle, accurate mathematical models In
practice Formulas can be complicated Advanced
math might be needed Are subject to noise
anyway Exact models leave to experts! We will
use approximate models
36
Order-of-Growth Classes
Order of Growth Name Typical code desdription example T(2N) T(N)
1 constant abc Statement Add two numbers 1
log N logarithmic while(Ngt1) NN/2 Divide in half Binary search 1
N linear for(i0 to N-1) ... loop Find the maximum 2
N log N linearithmic See sorting Divide and conquer mergesort 2
N2 quadratic for(i0 to N-1) for(j0 to N-1) ... Double loop Check all pairs 4
N3 cubic for(i0 to N-1) for(j0 to N-1) for(k0 to N-1) ... Triple loop Check all triples 8
2N exponential See combinatorial search Exhaustive search Check all subsets T(N)
37
Order-of-Growth
  • Definition If f(N) c g(N) for some constant c
    gt 0, then f(N) is O(g(N))
  • Ignores leading coefficient
  • Ignores lower order terms
  • Brassard notation O(g(N)) is the set of all
    functions with the same order
  • So 3-Sum algorithm is order N3
  • Leading coefficient depends on hardware,
    compiler, etc.

38
Order-of-Growth
  • Good News!
  • The following set of functions suffices to
    describe order of growth of most algorithms
  • 1, log N, N, N log N, N2, N3, 2N, N!

39
Order-of-Growth
40
Binary Search
  • Goal Given a sorted array and a key, find the
    index of the key in the array
  • Binary Search Compare key against middle entry
    (of what is left)
  • Too small, go left
  • Too big, go right
  • Equal, found

41
Binary Search Implementation
  • Trivial to implement?
  • First binary search published in 1946
  • First bug-free version in 1962
  • Bug in Java's Arrays.binarySearch() discovered
    in 2006!
  • http//googleresearch.blogspot.com/2006/06/extra-e
    xtra-read-all-about-it-nearly.html

42
Binary Search Math Analysis
  • Proposition BS uses at most 1lg N key compares
    for a sorted array of size N
  • Defn T(N) key compares on sorted array of
    size lt N
  • Recurrence
  • for N gt 1, T(N) lt T(N/2) 1
  • for N 1, T(1) 1

43
Binary Search Math Analysis
  • Recurrence
  • for N gt 1, T(N) lt T(N/2) 1
  • for N 1, T(1) 1
  • Pf Sketch (Assume N a power of 2)
  • T(N) lt T(N/2) 1
  • lt T(N/4) 1 1
  • lt T(N/8) 1 1 1 ...
  • lt T(N/N) 1 1 1 ... 1
  • 1 lg N

44
3-Sum
  • Version 0 N3 time, N space
  • Version 1 N2 log N time, N space
  • Version 2 N2 time, N space

45
3-Sum N2 log N Algorithm
  • Algorithm
  • Sort the N (distinct) integers
  • For each pair of numbers ai and aj,
  • Binary Search for -(ai aj)
  • Analysis Order of growth is N2 log N
  • Step 1 N2 using insertion sort
  • Step 2 N2 log N with binary search
  • Can achieve N2 by modifying BS step

46
Comparing Programs
  • Hypothesis Version 1 is significantly faster in
    practice than Version 0

Version 0
Version 1
N Time (s)
1000 0.14
2000 0.18
4000 0.34
8000 0.96
16000 3.67
32000 14.88
64000 59.16
N Time (s)
1000 0.1
2000 0.8
4000 6.4
8000 51.1
Theory works well in practice!
47
Memory
  • Bit 0 or 1 (binary digit)
  • Byte 8 bits (wasn't always that way)
  • Megabyte (MB) 1 million or 220 bytes
  • Gigabyte (GB) 1 billion or 230 bytes

NIST and networks guys
Everybody else
48
Memory
  • 64-bit machine assume 8-byte pointers
  • Can address more memory
  • Pointers use more space
  • Some JVMs compress ordinary object pointers to
    4 bytes to avoid this cost

49
Typical Memory Usage
Type Bytes
char 2N 24
int 4N 24
double 8N 24
Type Bytes
boolean 1
byte 1
char 2
int 4
float 4
long 8
double 8
1-D arrays
Type Bytes
char 2MN
int 4MN
double 8MN
Primitive types
2-D arrays
50
Typical Java Memory Usage
Object Overhead 16 bytes Object Reference 8
bytes Padding Objects use multiple of 8 bytes
Ex Date object public class Date private int
day private int month private int year
...
Object Overhead
16 bytes (OH)
day
4 bytes (int)
4 bytes (int)
month
4 bytes (int)
year
4 bytes (pad)
padding
32 bytes total
51
Summary
Empirical Analysis Execute pgm to perform
experiments Assume power law, formulate
hypothesis for running time Model allows us to
make predictions
52
Summary
Mathematical Analysis Analyze algo to count
freq of operations Use tilde notation to
simplify analysis Model allows us to explain
behavior
53
Summary
Scientific Method Mathematical model is
independent of particular system, applies to
machines not yet built Empirical approach
needed to validate theory, and to make
predictions
54
Next Lecture 5
  • Read Chapter 3
  • Basic data structures
Write a Comment
User Comments (0)
About PowerShow.com