Chapter 4 Assessing and Understanding Performance - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Chapter 4 Assessing and Understanding Performance

Description:

focusing on a specific task: DVD playback or graphic performance of games ... we have to improve the speed of multiplication if we want the program to run 5 ... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 32
Provided by: boch7
Category:

less

Transcript and Presenter's Notes

Title: Chapter 4 Assessing and Understanding Performance


1
Chapter 4Assessing and Understanding Performance
  • Bo Cheng

2
Which One Is Good?
Airplane Passengers Range (mi) Speed (mph)
Boeing 737-100 101 630 598
Boeing 747 470 4150 610
BAC/Sud Concorde 132 4000 1350
Douglas DC-8-50 146 8720 544
  • Depends on measures of performance
  • Cruising speed
  • Longest range
  • Largest capacity

3
Measuring Performance
  • Elapsed Time, wall-clock time or response time
  • Total time to complete a task
  • Including disk and memory accesses, I/O , etc.
  • a useful number, but often not good for
    comparison purposes
  • CPU (execution) time
  • Doesn't count I/O or time spent running other
    programs
  • can be broken up into system CPU time, and user
    CPU time
  • CPU time user CPU time system CPU time
  • Our focus user CPU time
  • time spent executing the lines of code that are
    "in" our program

4
CPU Performance Metrics
  • Response time the time between the start and the
    completion of a task (in time units)
  • Throughput the total amount of work done in a
    given time (in number of tasks per unit of time)

5
Performance
  • Problem
  • Machine A runs a program in 10 sec.
  • Machine B runs the same program in 15 sec.
  • How much faster is A than B ?

A is 1.5 times faster than B
6
Clock Rate Measurement
Name Example Measurement
Millisecond 1 msec (ms) 1.E-03
Microsecond 1 usec (us) 1.E-06
Nanosecond 1 nsec (ns) 1.E-09
Picosecond 1 psec (ps) 1.E-12
Femtosecond 1 fsec (fs) 1.E-15
  • Clock cycle The time for one clock period
    running at a constant rate
  • Clock rate is given in Hz (1/sec)
  • clock_cycle_time 1/clock_rate (in sec)

10 nsec clock cycle gt 100 MHz clock rate 1
nsec clock cycle gt 1 GHz clock rate 500
psec clock cycle gt 2 GHz clock rate 200 psec
clock cycle gt 5 GHz clock rate
7
MHz
http//www.webopedia.com/TERM/M/MHz.html
  • One MHz represents one million cycles per second.
  • The speed of microprocessors, called the clock
    speed, is measured in megahertz.
  • For example, a microprocessor that runs at 200
    MHz executes 200 million cycles per second.
  • One GHz represents 1 billion cycles per second.

8
CPU Time or CPU Execution Time
  • The actual time the CPU spends computing for a
    specific task
  • This time accounts for the time CPU is computing
    the given program, including operating system
    routines executed on the programs behave, and it
    does not include the time waiting for I/O and
    running other programs.
  • Performance of processor/memory 1 / CPU_time

9
CPU Execution Time Formula
  • E CPU Execution time for a program
  • N Number of CPU clock cycles for a program
  • T clock cycle Time
  • R clock Rate

10
Example
R 8 GHz
11
Clock cycles Per Instruction (CPI)
  • The average number of clock cycles per
    instruction for a program or program fragment

12
The Big Picture
  • Instruction count depends on the architecture,
    but not on the exact implementation
  • Average CPI depends on design details and on the
    mix of types of instructions executed in an
    application

13
Understanding Program Performance
Instruction Count CPI Clock Rate
Algorithm X Possibly
Programming Language X X
Compiler X X
ISA X X X
14
Using Performance Equation
Clock Cycle Time CPI
Computer A 250 ps 2
Computer B 500 ps 1.2
Which computer is faster for this program, and by
how much?
15
Computing CPI
  • Done by looking at the different types of
    instructions and using their individual cycle
    counts

Ci The count of the number of instructions of
class i executed CPIi The average number of
cycles per instruction for that instruction class
l n is the number of instruction classes
16
Example
CPI for this instruction class CPI for this instruction class CPI for this instruction class
A B C
CPI 1 2 3
Code Sequence CPI for this instruction class CPI for this instruction class CPI for this instruction class
Code Sequence A B C
1 2 1 2
2 4 1 1
17
Workload
  • A set of programs used for evaluating a computer
    or a system
  • Benchmarks programs specifically chosen to
    measure performance.
  • SPEC 2000 benchmarks (12 integer, 14
    floating-point programs).
  • Performance results given by benchmarks may not
    be correct if the system (or the compiler of the
    system) is optimized for the benchmarks

18
Benchmark
  • Programs specifically chosen to measure
    performance
  • Best determined by running a real application
  • use programs typical of expected workload
  • e.g., compilers/editors, scientific applications,
    graphics...
  • Small benchmarks
  • nice for architects and designers
  • SPEC (System Performance Evaluation Cooperative)
  • companies have agreed on a set of real program
    and inputs

19
Simplest Approach
Computer A Computer B
Program 1 (sec) 1 10
Program 2 (sec) 1000 100
Total (sec) 1001 110
20
Evaluating Performance
Desktop CPU Performance
Desktop SPEC CPU benchmark to measure CPU performance and response time
Desktop focusing on a specific task DVD playback or graphic performance of games
Server depend on the nature of intended application
Server Throughput
Server requirements on response time to individual events database query and web page request
Server SPECweb99
Embedded Computing EEMBC
  • Different classes and applications of computer
    require different types of benchmarks

Reproducibility list everything another
experimenter need to duplicate the results
21
SPEC CPU2000 Benchmark
22
SPEC CINT2000 and CFP2000
23
Relative Performance in Three Different Modes
24
Relative Energy Efficiency Comparison
25
Amdahls Law
Execution Time After Improvement ( Execution
Time Affected/ Amount of Improvement)
Execution Time Unaffected
Principle Make the common case fast
Example Suppose a program runs in 100 seconds on
a machine, with multiply operation responsible
for 80 seconds of this time. How much do we have
to improve the speed of multiplication if we want
the program to run 5 times faster?"
26
MIPS (million instructions per second)
Instruction class CPI
A 1
B 2
C 3
Code from Instruction counts (in billion) Instruction counts (in billion) Instruction counts (in billion)
Code from A B C
Compiler 1 5 1 1
Compiler 2 10 1 1
27
Always trust execution time metric!
http//www.faculty.uaf.edu/ffdr/EE443/Handouts/Set
5_Sp05_3pp.pdf
28
A Complete Example (I)
http//www.faculty.uaf.edu/ffdr/EE443/Handouts/Set
5_Sp05_3pp.pdf
29
A Complete Example (II)
30
A Complete Example (III)
31
Three problems with using MIPS
  • MIPS specifies the instruction execution rate but
    does not take into account the capabilities of
    the instructions.
  • We cannot compare computers with different
    instruction sets using MIPS, since the
    instruction counts will certainly differ.
  • MIPS varies between programs on the same
    computer
  • a computer cannot have a single MIPS rating for
    all programs.
  • MIPS can vary inversely with performance.
Write a Comment
User Comments (0)
About PowerShow.com