Cpsc 318 Computer Structures Lecture 3 Performance 2 - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Cpsc 318 Computer Structures Lecture 3 Performance 2

Description:

Very small benchmarks may be used. Benchmarks may be manually translated to optimize the performance. ... Benchmark v. trans. ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 28

Provided by: davepat4

Category:

more less

Transcript and Presenter's Notes

Title: Cpsc 318 Computer Structures Lecture 3 Performance 2

1
Cpsc 318Computer Structures Lecture 3
Performance 2

Dr. Son Vuong
(vuong_at_cs.ubc.ca)
January 15, 2004

2
Course Overview (topics)

Introduction (Lecture 1)
Performance (Lectures 2 and 3) (Today)
Assembly programming (MIPS) (Next
lecture)
Instruction set architecture
Processor design (pipelining, branch
prediction)
Caches, virtual memory, I/O
Compare and contrast current processor designs

3
Overview

Weve looked at
How do we measure performance?
Metrics
Benchmarking
Now
Review
Fallacies
Amdahls Law
MIPS
Arithmetic and Geometric means
A few examples

Readings Chapter 2 ( sections 2.7 to end)
4
Benchmarking games
Benchmark v. trans. To subject (a system) to a
series of tests in order to obtain prearranged
results not available on competitive systems.
-- S. Kelly-Bootle, The Devils
DP Dictionary

Differing configurations may be used to run the
same workload on two systems.
The compilers may be wired to optimize the
workload.
Test specifications may be written so that they
are biased toward one machine.
A synchronized job sequence may be used.
The workload may be arbitrarily picked.
Very small benchmarks may be used.
Benchmarks may be manually translated to
optimize the performance.

R. Jain, The art of computer systems performance
analysis
5
Basis of Evaluation
Cons
Pros

very specific
non-portable
difficult to run, or
measure
hard to identify cause

representative

Actual Target Workload

portable
widely used
improvements useful in reality

less representative

Full Application Benchmarks

easy to fool

Small Kernel Benchmarks

easy to run, early in design cycle

peak may be a long way from application
performance

identify peak capability and potential
bottlenecks

Microbenchmarks
6
Aspects of CPU Performance

instr count CPI clock rate
Program X
Compiler X X
Instr. Set X X
Organization X X
Technology X

7
Machine Organization

Capabilities Performance Characteristics of
Principal Functional Units (FUs)
(e.g., Registers, ALU, Shifters, Logic Units,
...)
Ways in which these components are interconnected
Information flows between components
Logic and means by which such information flow is
controlled.
Choreography of FUs to realize the ISA
Register Transfer Level (RTL) Description

Logic Designer's View
8
UltraSPARC chip
9
Example Organization

TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20

MBus Module
SuperSPARC
Floating-point Unit
L2
CC
DRAM Controller
Integer Unit
MBus
MBus control M-S Adapter
L64852
Inst Cache
Ref MMU
Data Cache
STDIO
SBus
serial
kbd
SCSI
Store Buffer
SBus DMA
mouse
Ethernet
audio
RTC
Bus Interface
SBus Cards
Boot PROM
Floppy
10
Example (RISC processor)
Base Machine (Reg / Reg) Op Freq
Cycles CPI(i) Time ALU 50 1
.5 23 Load 20 5 1.0 45 Store 10 3
.3 14 Branch 20 2 .4 18 2.2
Typical Mix
How much faster would the machine be if a better
data cache reduced the average load time to 2
cycles? How does this compare with using branch
prediction to save a cycle off the branch
time? What if two ALU instructions could be
executed at once?
11
Fallacies and Pitfalls
Common Misconceptions about Performance
Wise men learn by other mens mistakes, fools by
their own

-- H. G. Wells
12
Fallacies and Pitfalls

Expecting the improvement in one aspect of a
machine to increase performance by an amount
proportional to the size of the improvement
(Amdahls law). (pitfall)
Using MIPS to predict performance (fallacy)
Using the arithmetic mean of normalized execution
time to predict performance (pitfall)
The geometric mean of execution time ratios is
proportional to total execution time. (fallacy)

13
Amdahls Law
Consider an enhancement to a system that
accelerates a fraction f of the task by a speedup
factor s. Suppose the remainder of the task is
unaffected by the change.
Without enhancement
With enhancement
14
Amdahls Law

You can only go as fast as the slowest part
15
Example of Amdahls Law
Suppose we have a program that takes 100 seconds
to execution with the multiply taking 80 seconds
of this time. How much do I have to improve the
speed of multiplication if I want my program to
run 5 times faster?
gt 20 seconds
BUT the time_new must also be 20 seconds (to be
5 times faster)
16
Misuse of MIPS - Example
Assume the clock rate is 500MHz. Compare the
two, first using execution time and then MIPS.
CycleTime x CPI x InstructionCount
1 / (CycleTime x CPI x 106) ClockRate /
(CPI x 106)
17
MIPS example cont.
ns
ns
Compiler1 is 1.5 times faster than Compiler2
18
MIPS example cont.
Compiler2 has a higher MIPS rating than Compiler1
19
Arithmetic Mean
Normalized to machine A or to machine B?
Does this make any sense?
The problem is that the (arithmetic) mean of the
normalized performance is not a quantity that
makes sense!!
20
Geometric Mean
Normalized to machine B
Normalized to machine A
SPECmarks uses the geometric mean.
The product is meaningful, its the product of
the speedups!!
21
Example of Geometric Mean
Performance improvements in the latest versions
of seven layers of a new networking protocol was
measured separately for each layer.
What is the average improvement per layer?
22
Disadvantages of Geometric Mean
It does not track execution time!
But the geometric mean these two machines are
equal. This is true only for a workload that
runs program 1, 100 more times than program 2.
100 x 1 100 x 10 100 1000
Improving program 1 by 50, according to the
geometric mean, is equivalent to improving
program 2 by 50.
The only true measure is EXECUTION TIME!!!
23
SPEC
Standard Performance Evaluation
Committee www.specbench.org
The System Performance Evaluation Cooperative
(SPEC) was founded in 1988 by a small number of
workstation vendors who realized that the
marketplace was in desperate need of realistic,
standardized performance tests. Their key
realization was that an ounce of honest data was
worth more than a pound of marketing hype. SPEC
has grown to become one of the more successful
performance standardization bodies with more than
40 member companies. SPEC publishes several
hundred different performance results each
quarter spanning across a variety of system
performance disciplines. www.specbench.org
24
SPEC95 Benchmarks
Name Application go Artificial intelligence
plays the game of "Go" m88ksim Motorola 88K chip
simulator runs test program gcc New version of
GCC builds SPARC code compress Compresses and
decompresses file in memory li LISP
interpreter ljpeg Graphic compression and
decompression perl Manipulates strings
(anagrams) and prime numbers in Perl vortex A
database program tomcatv A mesh-generation
program swim Shallow water model with 513 x 513
grid su2cor Quantum physics Monte Carlo
simulation hydro2d Astrophysics
Hydrodynamical Navier Stokes equations mgrid Mult
i-grid solver in 3D potential field applu Parabol
ic/elliptic partial differential
equations turb3d Simulates isotropic,homogeneous
turbulence in a cube apsi Solves problems
regarding temperature, wind, velocity and
distribution of
pollutants fpppp Quantum chemistry wave5 Plasma
physics electromagnetic particle simulation
25
SPEC CPU2000

12 integer (gzip, gcc, crafty, perl, bzip, ...)
14 floating-point (swim, mesa, art, apsi, ...)
Separate average for integer (CINT2000) and FP
(CFP2000) relative to base machine Sun 300MHz
256Mb-RAM Ultra5_10, which gets score of 100
www.spec.org/osg/cpu2000/
They measure
System speed (SPECint2000)
System throughput (SPECint_rate2000)

26
An in Conclusion