Title: COMP 206: Computer Architecture and Implementation
1COMP 206Computer Architecture and Implementation
- Montek Singh
- Mon., Sep 5, 2005
- Lecture 2
2Outline
- Quantitative Principles of Computer Design
- Amdahls law (make the common case fast)
- Performance Metrics
- MIPS, FLOPS, and all that
- Examples
3Quantitative Principles of Computer Design
Performance Rate of producing results Throughput B
andwidth
Execution time Response time Latency
4Comparison
Y is n times larger than X
Y is n larger than X
5Amdahls Law (1967)
Validity of the single processor approach to
achieving large scale computing capabilities, G.
M. Amdahl, AFIPS Conference Proceedings, pp.
483-485, April 1967
- Historical context
- Amdahl was demonstrating the continued validity
of the single processor approach and of the
weaknesses of the multiple processor approach - Paper contains no mathematical formulation, just
arguments and simulation - The nature of this overhead appears to be
sequential so that it is unlikely to be amenable
to parallel processing techniques. - A fairly obvious conclusion which can be drawn
at this point is that the effort expended on
achieving high parallel performance rates is
wasted unless it is accompanied by achievements
in sequential processing rates of very nearly the
same magnitude. - Nevertheless, it is of widespread applicability
in all kinds of situations
6Amdahls Law
Bottleneckology Evaluating Supercomputers,
Jack Worlton, COMPCOM 85, pp. 405-406
Average execution rate (performance)
Fraction of results generated at this rate
Weighted harmonic mean
7Example of Amdahls Law
30 of results are generated at the rate of 1
MFLOPS, 20 at 10 MFLOPS, 50 at 100 MFLOPS. What
is the average performance? What is the
bottleneck?
8Amdahls Law (HP3 book, pp. 40-41)
9Implications of Amdahls Law
- The performance improvements provided by a
feature are limited by how often that feature is
used - As stated, Amdahls Law is valid only if the
system always works with exactly one of the rates - If a non-blocking cache is used, or there is
overlap between CPU and I/O operations, Amdahls
Law as given here is not applicable - Bottleneck is the most promising target for
improvements - Make the common case fast
- Infrequent events, even if they consume a lot of
time, will make little difference to performance - Typical use Change only one parameter of
system, and compute effect of this change - The same program, with the same input data,
should run on the machine in both cases
10Make The Common Case Fast
- All instructions require an instruction fetch,
only a fraction require a data fetch/store - Optimize instruction access over data access
- Programs exhibit locality
- Spatial Locality
- items with addresses near one another tend to be
referenced close together in time - Temporal Locality
- recently accessed items are likely to be accessed
in the near future - Access to small memories is faster
- Provide a storage hierarchy such that the most
frequent accesses are to the smallest (closest)
memories.
11Make The Common Case Fast (2)
- What is the common case?
- The rate at which the system spends most of its
time - The bottleneck
- What does this statement mean precisely?
- Make the common case faster, rather than making
some other case faster - Make the common case faster by a certain amount,
rather than making some other case faster by the
same amount - Absolute amount?
- Relative amount?
- This principle is merely an informal statement of
a frequently correct consequence of Amdahls Law
12Make The Common Case Fast (3a)
A machine produces 20 and 80 of its results at
the rates of 1 and 3 MFLOPS, respectively. What
is more advantageous to improve the 1 MFLOPS
rate, or to improve the 3 MFLOPS rate?
Generalize problem Assume rates are x and y
MFLOPS
At (x,y) (1,3), this indicates that it is
better to improve x, the 1 MFLOPS rate, which is
not the common case.
13Make The Common Case Fast (3b)
Lets say that we want to make the same relative
change to one or the other rate, rather than the
same absolute change.
At (x,y) (1,3), this indicates that it is
better to improve y, the 3 MFLOPS rate, which is
the common case.
If there are two different execution rates,
making the common case faster by the same
relative amount is always more advantageous than
the alternative. However, this does not
necessarily hold if we make absolute changes of
the same magnitude. For three or more rates,
further analysis is needed.
14Basics of Performance
15Details of CPI
16MIPS
- Machines with different instruction sets?
- Programs with different instruction mixes?
- Dynamic frequency of instructions
- Uncorrelated with performance
- Marketing metric
- Meaningless Indicator of Processor Speed
17MFLOP/s
- Popular in supercomputing community
- Often not where time is spent
- Not all FP operations are equal
- Normalized MFLOP/s
- Can magnify performance differences
- A better algorithm (e.g., with better data reuse)
can run faster even with higher FLOP count - DGEQRF vs. DGEQR2 in LAPACK
18Aspects of CPU Performance