Title: Performance II
1Performance II
- Last Time
- Computer Architecture definition and drivers
- Basic notions of Performance and Relative
Performance - Today
- Quiz
- Time bases and Performance Metrics
- Amdahls Law
- Reminders/Announcements
- Read PH Chapter 2, Performance
2Comparing Computers using Metrics
- Run programs, record execution times
- How can we describe the relative performance of
machines with such a metric?
3Relative Performance
- Can be confusing
- A runs in 12 seconds
- B runs in 20 seconds
- A/B .6 , so A is 40 faster, or 1.4X faster, or
B is .40 slower - B/A 1.67, so A is 67 faster, or 1.67X faster,
or B is 67 slower - Needs a precise definition
4Relative Performance Statements
ExecTimeB 75 __________ __
1.5 ExecTimeA 50
- Performance Ratio (A/B)
- A is 1.5 times faster than B
- Performance Ratio (A/B)
- Performance Ratio (B/A)
- B is 0.67 times the performance of A
PerfA 1/ExecTimeA 75 ______
__________ __ 1.5 PerfB
1/ExecTimeB 50
ExecTimeA 50 __________ __
0.67 ExecTimeB 75
5Performance
, for program X
- only has meaning in the context of a program or
workload - Not very intuitive as an absolute measure
6Defining Relative Performance
PerformanceX
Relative Performance
Execution TimeY
n
Execution TimeX
PerformanceY
- We can remove all ambiguity by always
constraining n to be gt 1 gt machine x is n times
faster than y.
7Performance Measurements
- gt Of metrics, not computer architectures!
- Other Metrics
Millions of of Instructions (for a particular
program) Insts/Sec ______________ (MIPS) Exe
cTime x 106 (particular program) Cycles
per ExecTime x Clock Rate Instruction
____________________ Complexity of
Instructions (CPI) of Instructions Clock
Rate Hardware Implementation (Megahertz)
Characteristic Complexity of a cycle
8Performance Summary
- Many metrics, basis for comparison
- Relative comparison
- Quantitative comparison
- Execution time is the preferred metric.
- Cannot hide anything, include things by default
- Easiest to avoid errors in comparison
- Corresponds to user waiting time, resource usage
- What metrics?
9What is Time?
- CPU Execution Time CPU clock cycles Clock
cycle time - CPU clock cycles / Clock rate
- Every conventional processor has a clock with an
associated clock cycle time or clock rate. - Every program runs in an integral number of clock
cycles. - x MHz x millions of cycles/second (clock rate)
- 1/ (x MHz) cycle time, 1/(500 MHz) 2 ns
10How many clock cycles?
- Number of CPU cycles Instructions executed
- Average Clock Cycles per Instruction (CPI)
- or
- CPI CPU clock cycles / Instruction count
11All Together Now
seconds
instructions
seconds/cycle
cycles/instruction
12Who Affects Performance?
CPU Execution Time
Instruction Count
CPI
Clock Cycle Time
X
X
- programmer
- compiler
- instruction-set architect
- machine architect
- hardware designer
- materials scientist/physicist/silicon engineer
13Performance Variation
CPU Execution Time
Instruction Count
CPI
Clock Cycle Time
X
X
14Speedup
- Speedup is just relative performance on the same
machine with something changed. - speedup relative performance
15Amdahls Law
- The impact of a performance improvement is
limited by the percent of execution time affected
by the improvement - Make the common case fast!!
Execution Time Affected
Execution time after improvement
Execution Time Unaffected
Amount of Improvement
16MIPS and MFLOPS
- MIPS - million instructions per second
- number of instructions executed in program
Clock rate - execution time in seconds 106
CPI 106 - MFLOPS - million floating point operations per
second - number of floating point operations executed in
program - execution time in seconds 106
17Millions of Instructions per Second (MIPS)
- MIPS of insts (insts/sec)
- Time 106
- All rates measures of performance are
- Units of work Xs
- Time Unit Sec
- Problem to make these measures representative,
units of work must be conserved. - They must correspond to real work that is is
irreducible! - (i.e. work that is conserved over ANY
implementation)
18Units of Work
- Instructions, Floating Point Operations, Window
updates, answers, etc. - Are these things conserved in a computation?
- Instructions compiler, architecture
- Floating Point operations compiler, algorithm
- Window updates algorithm
- Answers to real problems ??
- Depends on compiler, architecture, algorithm,
implementation, etc. gt all of this is part of
the benchmark
19Example Instructions are not always conserved
addi R1, R1, 1 load R2, 16(R1) addi R1, R1,
1 load R3, 16(R1) addi R1, R1, 1 load R4,
16(R1)
load R1, 16(R2) add R3, R4, R1 load R1,
16(R2) add R5, R6, R1
- Loads and stores may be redundant
- data motion, not computation, real work
- Arithmetic operations may be redundant
- 3 adds can be reduced to one
- add 3, and fix all of the other offsets
- gt Many things which seem like work can be
optimized away...
20Example Floating point operations are not always
conserved
- Matrix multiplication Strassens algorithm
- Algorithmic improvements
- Iterative algorithms that converge, different
precision or subtle arithmetic differences can
have a major effect. - Precision, arithmetic details
- Errors (flaws) can require workarounds
- Intel Pentium bug, numerous operations to replace
each FDIV (multiple floating point operations) - Others?
21A Benchmarking Example
- Pentium II 450Mhz system, Microsoft C compiler
- Compile the program, execute, count instructions
- Measure at 400 MIPs
- What does this tell you about performance?
- Compile again, this time with optimization ON!
- Compile takes a lot longer, execute, count
instructions - Measure performance at 350 MIPs
- What happened?
22Benchmarking Example (cont.)
- of InstsA vs of InstsB
- ExecTimeA ExecTimeB
- There are fewer instructions executed in the
optimized program! - MIPS rating depends on compiler
- Quality of code generated
- Optimized for instruction execution time, not
MIPS rating - Compilers are always benchmarked with the machine
- How could you cheat to get a high MIPS rating?
23Benchmarking Example II
- Power Macintosh, 500Mhz, PowerPC 603
- Compile same program, optimized
- Execute, assuming no obvious cheating
- Experiment produces 450 MIPS rating
- Is this faster than the Pentium II?
- gt Theres no easy way to tell from this
information! - Why?
- The unit of work has changed.
- Pentium Instruction ! PowerPC instruction
- Hard to compare MIPS across architectures, of
little use for comparing architectures. Resort
to execution time.
24Other Measures of Work
- Floating Point Operations
- Window Updates
- Frames/Polygons (rendering)
- Megabytes (communication)
- Limitations of each of these?
- How can you cheat/reduce each of these?
25Performance Metrics Summary
- Many possible measures of work / performance
metrics - Choosing is rife with potential errors
- Because it includes everything, Execution time is
the safest choice. - Still need to analyze the other influences
carefully before you can draw any conclusions
about the causes. - Amdahls Law