Title: Cpsc 318 Computer Structures Lecture 3 Performance 2
1Cpsc 318Computer Structures Lecture 3
Performance 2
- Dr. Son Vuong
- (vuong_at_cs.ubc.ca)
- January 15, 2004
2Course Overview (topics)
- Introduction (Lecture 1)
- Performance (Lectures 2 and 3) (Today)
- Assembly programming (MIPS) (Next
lecture) - Instruction set architecture
- Processor design (pipelining, branch
prediction) - Caches, virtual memory, I/O
- Compare and contrast current processor designs
3Overview
- Weve looked at
- How do we measure performance?
- Metrics
- Benchmarking
- Now
- Review
- Fallacies
- Amdahls Law
- MIPS
- Arithmetic and Geometric means
- A few examples
Readings Chapter 2 ( sections 2.7 to end)
4Benchmarking games
Benchmark v. trans. To subject (a system) to a
series of tests in order to obtain prearranged
results not available on competitive systems.
-- S. Kelly-Bootle, The Devils
DP Dictionary
- Differing configurations may be used to run the
same workload on two systems. - The compilers may be wired to optimize the
workload. - Test specifications may be written so that they
are biased toward one machine. - A synchronized job sequence may be used.
- The workload may be arbitrarily picked.
- Very small benchmarks may be used.
- Benchmarks may be manually translated to
optimize the performance.
R. Jain, The art of computer systems performance
analysis
5Basis of Evaluation
Cons
Pros
- very specific
- non-portable
- difficult to run, or
- measure
- hard to identify cause
Actual Target Workload
- portable
- widely used
- improvements useful in reality
Full Application Benchmarks
Small Kernel Benchmarks
- easy to run, early in design cycle
- peak may be a long way from application
performance
- identify peak capability and potential
bottlenecks
Microbenchmarks
6Aspects of CPU Performance
- instr count CPI clock rate
- Program X
- Compiler X X
- Instr. Set X X
- Organization X X
- Technology X
7Machine Organization
- Capabilities Performance Characteristics of
Principal Functional Units (FUs) - (e.g., Registers, ALU, Shifters, Logic Units,
...) - Ways in which these components are interconnected
- Information flows between components
- Logic and means by which such information flow is
controlled. - Choreography of FUs to realize the ISA
- Register Transfer Level (RTL) Description
Logic Designer's View
8UltraSPARC chip
9Example Organization
- TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20
MBus Module
SuperSPARC
Floating-point Unit
L2
CC
DRAM Controller
Integer Unit
MBus
MBus control M-S Adapter
L64852
Inst Cache
Ref MMU
Data Cache
STDIO
SBus
serial
kbd
SCSI
Store Buffer
SBus DMA
mouse
Ethernet
audio
RTC
Bus Interface
SBus Cards
Boot PROM
Floppy
10Example (RISC processor)
Base Machine (Reg / Reg) Op Freq
Cycles CPI(i) Time ALU 50 1
.5 23 Load 20 5 1.0 45 Store 10 3
.3 14 Branch 20 2 .4 18 2.2
Typical Mix
How much faster would the machine be if a better
data cache reduced the average load time to 2
cycles? How does this compare with using branch
prediction to save a cycle off the branch
time? What if two ALU instructions could be
executed at once?
11Fallacies and Pitfalls
Common Misconceptions about Performance
Wise men learn by other mens mistakes, fools by
their own
-- H. G. Wells
12Fallacies and Pitfalls
- Expecting the improvement in one aspect of a
machine to increase performance by an amount
proportional to the size of the improvement
(Amdahls law). (pitfall) - Using MIPS to predict performance (fallacy)
- Using the arithmetic mean of normalized execution
time to predict performance (pitfall) - The geometric mean of execution time ratios is
proportional to total execution time. (fallacy)
13Amdahls Law
Consider an enhancement to a system that
accelerates a fraction f of the task by a speedup
factor s. Suppose the remainder of the task is
unaffected by the change.
Without enhancement
With enhancement
14Amdahls Law
You can only go as fast as the slowest part
15Example of Amdahls Law
Suppose we have a program that takes 100 seconds
to execution with the multiply taking 80 seconds
of this time. How much do I have to improve the
speed of multiplication if I want my program to
run 5 times faster?
gt 20 seconds
BUT the time_new must also be 20 seconds (to be
5 times faster)
16Misuse of MIPS - Example
Assume the clock rate is 500MHz. Compare the
two, first using execution time and then MIPS.
CycleTime x CPI x InstructionCount
1 / (CycleTime x CPI x 106) ClockRate /
(CPI x 106)
17MIPS example cont.
ns
ns
Compiler1 is 1.5 times faster than Compiler2
18MIPS example cont.
Compiler2 has a higher MIPS rating than Compiler1
19Arithmetic Mean
Normalized to machine A or to machine B?
Does this make any sense?
The problem is that the (arithmetic) mean of the
normalized performance is not a quantity that
makes sense!!
20Geometric Mean
Normalized to machine B
Normalized to machine A
SPECmarks uses the geometric mean.
The product is meaningful, its the product of
the speedups!!
21Example of Geometric Mean
Performance improvements in the latest versions
of seven layers of a new networking protocol was
measured separately for each layer.
What is the average improvement per layer?
22Disadvantages of Geometric Mean
It does not track execution time!
But the geometric mean these two machines are
equal. This is true only for a workload that
runs program 1, 100 more times than program 2.
100 x 1 100 x 10 100 1000
Improving program 1 by 50, according to the
geometric mean, is equivalent to improving
program 2 by 50.
The only true measure is EXECUTION TIME!!!
23SPEC
Standard Performance Evaluation
Committee www.specbench.org
The System Performance Evaluation Cooperative
(SPEC) was founded in 1988 by a small number of
workstation vendors who realized that the
marketplace was in desperate need of realistic,
standardized performance tests. Their key
realization was that an ounce of honest data was
worth more than a pound of marketing hype. SPEC
has grown to become one of the more successful
performance standardization bodies with more than
40 member companies. SPEC publishes several
hundred different performance results each
quarter spanning across a variety of system
performance disciplines. www.specbench.org
24SPEC95 Benchmarks
Name Application go Artificial intelligence
plays the game of "Go" m88ksim Motorola 88K chip
simulator runs test program gcc New version of
GCC builds SPARC code compress Compresses and
decompresses file in memory li LISP
interpreter ljpeg Graphic compression and
decompression perl Manipulates strings
(anagrams) and prime numbers in Perl vortex A
database program tomcatv A mesh-generation
program swim Shallow water model with 513 x 513
grid su2cor Quantum physics Monte Carlo
simulation hydro2d Astrophysics
Hydrodynamical Navier Stokes equations mgrid Mult
i-grid solver in 3D potential field applu Parabol
ic/elliptic partial differential
equations turb3d Simulates isotropic,homogeneous
turbulence in a cube apsi Solves problems
regarding temperature, wind, velocity and
distribution of
pollutants fpppp Quantum chemistry wave5 Plasma
physics electromagnetic particle simulation
25SPEC CPU2000
- 12 integer (gzip, gcc, crafty, perl, bzip, ...)
- 14 floating-point (swim, mesa, art, apsi, ...)
- Separate average for integer (CINT2000) and FP
(CFP2000) relative to base machine Sun 300MHz
256Mb-RAM Ultra5_10, which gets score of 100 - www.spec.org/osg/cpu2000/
- They measure
- System speed (SPECint2000)
- System throughput (SPECint_rate2000)
26An in Conclusion
- Benchmarking is essential
- Fallacies
- Amdahls Law cant do better than part
- MIPS can be misleading
- Arithmetic and Geometric means
- Use geometric mean for ratios (normalized
performance) - Excution time is the true measure
27Course Overview (topics)
- Introduction (Lecture 1)
- Performance (Lectures 2 and 3)
- Assembly programming (MIPS) (Next
lecture) - Instruction set architecture
- Processor design (pipelining, branch
prediction) - Caches, virtual memory, I/O
- Compare and contrast current processor designs