Title: Chapter 4 Assessing and Understanding Performance
1Chapter 4Assessing and Understanding Performance
2Which One Is Good?
Airplane Passengers Range (mi) Speed (mph)
Boeing 737-100 101 630 598
Boeing 747 470 4150 610
BAC/Sud Concorde 132 4000 1350
Douglas DC-8-50 146 8720 544
- Depends on measures of performance
- Cruising speed
- Longest range
- Largest capacity
3Measuring Performance
- Elapsed Time, wall-clock time or response time
- Total time to complete a task
- Including disk and memory accesses, I/O , etc.
- a useful number, but often not good for
comparison purposes - CPU (execution) time
- Doesn't count I/O or time spent running other
programs - can be broken up into system CPU time, and user
CPU time - CPU time user CPU time system CPU time
- Our focus user CPU time
- time spent executing the lines of code that are
"in" our program
4CPU Performance Metrics
- Response time the time between the start and the
completion of a task (in time units) - Throughput the total amount of work done in a
given time (in number of tasks per unit of time)
5Performance
- Problem
- Machine A runs a program in 10 sec.
- Machine B runs the same program in 15 sec.
- How much faster is A than B ?
A is 1.5 times faster than B
6Clock Rate Measurement
Name Example Measurement
Millisecond 1 msec (ms) 1.E-03
Microsecond 1 usec (us) 1.E-06
Nanosecond 1 nsec (ns) 1.E-09
Picosecond 1 psec (ps) 1.E-12
Femtosecond 1 fsec (fs) 1.E-15
- Clock cycle The time for one clock period
running at a constant rate -
- Clock rate is given in Hz (1/sec)
- clock_cycle_time 1/clock_rate (in sec)
10 nsec clock cycle gt 100 MHz clock rate 1
nsec clock cycle gt 1 GHz clock rate 500
psec clock cycle gt 2 GHz clock rate 200 psec
clock cycle gt 5 GHz clock rate
7MHz
http//www.webopedia.com/TERM/M/MHz.html
- One MHz represents one million cycles per second.
- The speed of microprocessors, called the clock
speed, is measured in megahertz. - For example, a microprocessor that runs at 200
MHz executes 200 million cycles per second. - One GHz represents 1 billion cycles per second.
8CPU Time or CPU Execution Time
- The actual time the CPU spends computing for a
specific task - This time accounts for the time CPU is computing
the given program, including operating system
routines executed on the programs behave, and it
does not include the time waiting for I/O and
running other programs. - Performance of processor/memory 1 / CPU_time
9CPU Execution Time Formula
- E CPU Execution time for a program
- N Number of CPU clock cycles for a program
- T clock cycle Time
- R clock Rate
10Example
R 8 GHz
11Clock cycles Per Instruction (CPI)
- The average number of clock cycles per
instruction for a program or program fragment
12The Big Picture
- Instruction count depends on the architecture,
but not on the exact implementation - Average CPI depends on design details and on the
mix of types of instructions executed in an
application
13Understanding Program Performance
Instruction Count CPI Clock Rate
Algorithm X Possibly
Programming Language X X
Compiler X X
ISA X X X
14Using Performance Equation
Clock Cycle Time CPI
Computer A 250 ps 2
Computer B 500 ps 1.2
Which computer is faster for this program, and by
how much?
15Computing CPI
- Done by looking at the different types of
instructions and using their individual cycle
counts
Ci The count of the number of instructions of
class i executed CPIi The average number of
cycles per instruction for that instruction class
l n is the number of instruction classes
16Example
CPI for this instruction class CPI for this instruction class CPI for this instruction class
A B C
CPI 1 2 3
Code Sequence CPI for this instruction class CPI for this instruction class CPI for this instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1
17Workload
- A set of programs used for evaluating a computer
or a system - Benchmarks programs specifically chosen to
measure performance. - SPEC 2000 benchmarks (12 integer, 14
floating-point programs). - Performance results given by benchmarks may not
be correct if the system (or the compiler of the
system) is optimized for the benchmarks
18Benchmark
- Programs specifically chosen to measure
performance - Best determined by running a real application
- use programs typical of expected workload
- e.g., compilers/editors, scientific applications,
graphics... - Small benchmarks
- nice for architects and designers
- SPEC (System Performance Evaluation Cooperative)
- companies have agreed on a set of real program
and inputs
19Simplest Approach
Computer A Computer B
Program 1 (sec) 1 10
Program 2 (sec) 1000 100
Total (sec) 1001 110
20Evaluating Performance
Desktop CPU Performance
Desktop SPEC CPU benchmark to measure CPU performance and response time
Desktop focusing on a specific task DVD playback or graphic performance of games
Server depend on the nature of intended application
Server Throughput
Server requirements on response time to individual events database query and web page request
Server SPECweb99
Embedded Computing EEMBC
- Different classes and applications of computer
require different types of benchmarks
Reproducibility list everything another
experimenter need to duplicate the results
21SPEC CPU2000 Benchmark
22SPEC CINT2000 and CFP2000
23Relative Performance in Three Different Modes
24Relative Energy Efficiency Comparison
25Amdahls Law
Execution Time After Improvement ( Execution
Time Affected/ Amount of Improvement)
Execution Time Unaffected
Principle Make the common case fast
Example Suppose a program runs in 100 seconds on
a machine, with multiply operation responsible
for 80 seconds of this time. How much do we have
to improve the speed of multiplication if we want
the program to run 5 times faster?"
26MIPS (million instructions per second)
Instruction class CPI
A 1
B 2
C 3
Code from Instruction counts (in billion) Instruction counts (in billion) Instruction counts (in billion)
Code from A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
27Always trust execution time metric!
http//www.faculty.uaf.edu/ffdr/EE443/Handouts/Set
5_Sp05_3pp.pdf
28A Complete Example (I)
http//www.faculty.uaf.edu/ffdr/EE443/Handouts/Set
5_Sp05_3pp.pdf
29A Complete Example (II)
30A Complete Example (III)
31Three problems with using MIPS
- MIPS specifies the instruction execution rate but
does not take into account the capabilities of
the instructions. - We cannot compare computers with different
instruction sets using MIPS, since the
instruction counts will certainly differ. - MIPS varies between programs on the same
computer - a computer cannot have a single MIPS rating for
all programs. - MIPS can vary inversely with performance.