Title: Chapter 1 Fundamentals of Computer Design
1Chapter 1Fundamentals of Computer Design
EEF011 Computer Architecture?????
- ???
- ??????????
- September 2004
2Chapter 1. Fundamentals of Computer Design
- 1.1 Introduction
- 1.2 The Changing Face of Computing and the Task
of the Computer Designer - 1.3 Technology Trends
- 1.4 Cost, Price, and Their Trends
- 1.5 Measuring and Reporting Performance
- 1.6 Quantitative Principles of Computer Design
- 1.7 Putting It All Together
- 1.8 Another View
- 1.9 Fallacies and Pitfalls
- 1.10 Concluding Remarks
3Computer Architecture 1.1 Introduction
- Rapid rate of improvement in the roughly 55 years
- the technology used to build computers
- innovation in computer design
- Beginning in about 1970,computer designers became
largely dependent upon IC technology - Emergence of microprocessor late 1970s
- C virtual elimination of assembly language
programming (portability) - UNIX standardized, vendor-independent OS
- Lowered the cost and risk of bringing out a new
architecture
41.1 Introduction (cont.)
- Development of RISC (Reduced Instruction Set
Computer) early 1980s - Focused the attention of designers on
- The exploitation of instruction-level parallelism
- Initially through pipelining, and
- later through multiple instruction issue
- The use of caches
- initially in simple forms and
- later using more sophisticated organizations and
optimizations
5Figure 1.1 20 years of sustained growth in
performance at an annual rate of over 50 (two
different versions of SPEC, i.e., SPEC92 and
SPEC95)
- Effects
- Significantly enhanced the capability available
to users - Dominance of microprocessor-based computers
across the entire range of computer design - Minicomputers and mainframes replaced
- Supercomputers built with collections of
microprocessors
61.2 Changing Face of Computing
- 1960s dominated by mainframes
- business data processing, large-scale scientific
computing - 1970s the birth of minicomputers
- scientific laboratories, time-sharing
multiple-user environment - 1980s the rise of the desktop computer (personal
computer and workstation) - 1990s the emergence of the Internet and the
World Wide Web - handheld computing devices
- emergence of high-performance digital consumer
electronics
7Segmented Markets
- Desktop computing to optimize price-performance
- the largest market in dollar terms
- Web-centric, interactive applications
- Servers
- Web-based services, replacing traditional
mainframe - Availability, Scalability, Throughput
- Embedded Computers
- Features
- The presence of the computers is not immediately
obvious - The application can usually be carefully tuned
for the processor and system - Widest range of processing power and cost
- price is a key factor in the design of computers
- Real-time performance requirement
- Key characteristics
- the need to minimize memory and power
- The use of processor cores together with
application-specific circuitry
8(No Transcript)
9Task of Computer Designer
- Tasks
- Determine what attributes are important for a new
machine, then - design a machine to maximize performance while
staying within cost and power constraints - Efforts
- instruction set design, functional organization,
logic design, and implementation (IC design,
packaging, power, and cooling) - Design a computer to meet functional requirements
as well as price, power, and performance goals - (see Figure 1.4)
10Task of a Computer Designer
Effect of trends Designer often needs to design
for the next technology because during a single
product cycle (typically 4-5 years), key
technologies, such as memory, change sufficiently
that the designer must plan for these changes.
11(No Transcript)
121.3 Technology Trends
- The designer must be especially aware of rapidly
occurring changes in implementation technology - Integrated circuit logic technology
- transistor density increases by about 35 per
year - die size are less predictable and slower
- combined effect growth rate in transistor count
about 55 per year - Semiconductor DRAM
- density increases by between 40 and 60 per year
- cycle time decreased by about one-third in 10
years - bandwidth per chip increases about twice
- Magnetic disk technology
- Network technology
13Moores Law
- Gordon Moore (Founder of Intel) observed in 1965
that the number of transistors that could be
crammed on a chip doubles every year. - http//www.intel.com/research/silicon/mooreslaw.h
tm
14Technology Trends
- Based on SPEED, the CPU has increased
dramatically, but memory and disk have increased
only a little. This has led to dramatic changed
in architecture, Operating Systems, and
Programming practices.
Capacity Speed (latency) Logic 2x in 3
years 2x in 3 years DRAM 4x in 3 years 2x in
10 years Disk 4x in 3 years 2x in 10 years
151.4 Cost, Price, and Their Trends
- Price what you sell a finished good
- 1999 more than half the PCs sold were priced at
less than 1000 - Cost the amount spent to produce it, including
overhead
16Cost and Its Trend
- Time
- Learning curve manufacturing cost reduces over
time - Yield the percentage of the manufactured devices
that survive the testing procedure - As the technology matures over time, the yield
improves and hence, things get cheaper - Volume
- Increasing volume decreases cost (time for
learning curve), - Increases purchasing and manufacturing efficiency
- Commodities
- Products are sold by multiple vendors in large
volumes and are essentially identical - The low-end business such as DRAMs, disks,
monitors - Improved competition ? reduced cost
17Prices of six generations of DRAMs
Between the start of a project and the shipping
of a produce, say, two years, the cost of a new
DRAM drops by a factor of between 5 and 10 in
constant dollars.
18Price of an Intel Pentium III at a Given Frequency
It decreases over time as yield enhancements
decrease the cost of a good die and competition
forces price reductions.
19Cost and Its Trend (Cont.)
- Time
- Learning curve manufacturing cost reduces over
time - Yield the percentage of the manufactured devices
that survive the testing procedure - As the technology matures over time, the yield
improves and hence, things get cheaper - Volume
- Increasing volume decreases cost (time for
learning curve), - Increases purchasing and manufacturing efficiency
- Commodities
- Products are sold by multiple vendors in large
volumes and are essentially identical - The low-end business such as DRAMs, disks,
monitors - Improved competition ? reduced cost
20Wafer and Dies
Exponential cost decrease technology basically
the same A wafer is tested and chopped into dies
that are packaged
Figure 1.8. This wafer contains 564 MIPS64 R20K
processors in 0.18? Figure 1.7. Intel Pentium 4
microprocessor Die
21Cost of an Integrated Circuit (IC)
(A greater portion of the cost that varies
between machines)
(sensitive to die size)
of dies along the edge
Todays technology ? ? 4, detect density 0.4 and
0.8 per cm2
22Cost of an IC (cont.)
23Cost Example in a 1000 PC in 2001
241.5 Measuring and Reporting Performance
- Designing high performance computers is one of
the major goals of any computer architect. - As a result, assessing the performance of
computer hardware is the at the heart of computer
design and greatly affects the demand and market
value of the computer. - However, measuring performance of a computer
system is not a straightforward task - Metrics How do we describe in a numerical way
the performance of a computer? - What tools do we use to find those metrics?
- How do we summarize the performance?
25What is the computer user interested in?
- Reduce the time to run certain task
- Execution time (response time)
- The time between the start and the completion of
an event - Increase the tasks per week, day, hour, sec, ns
- Throughput
- The total amount of work done in a given time
26Example
- Do the following changes to a computer system
increase throughput, reduce response time, or
both? - Replacing the processor in a computer with a
faster version - Adding additional processors to a system that
uses multiple processors for separate tasks for
example handling an airline reservation system
- Answer
- Both response time and throughput are improved
- Only throughput increases
27Execution Time and Performance Comparison
- In this subject, we will be primarily interested
in execution time as a measure of performance - The relationship between performance and
execution time on a computer X (reciprocal) is
defined as follows - To compare design alternatives, we use the
following equation - X is n times faster than Y or the throughput
of X is n times higher than Y means that the
execution time is n times less on X than Y - To maximize performance of an application, we
need to minimize its execution time
28Measure Performance user CPU time
- Response time may include disk access, memory
access, input/output activities, CPU event and
operating system overhead - everything - In order to get an accurate measure of
performance, we use CPU time instead of using
response time - CPU time is the time the CPU spends computing a
program and does not include time spent waiting
for I/O or running other programs - CPU time can also be divided into user CPU time
(program) and system CPU time (OS) - Key in UNIX command time, we have,
- 90.7u 12.9s 2.39 65 (user CPU, system CPU,
total response,) - In our performance measures, we use user CPU time
because of its independence on the OS and other
factors
29Programs for Measuring Performance
- Real applications text processing software
(Word), compliers (C), and other applications
like Photoshop have inputs, outputs, and
options when the user wants to use them - One major downside Real applications often
encounter portability problems arising from
dependences on OS or complier - Modified (or scripted) applications Modification
is to enhance portability or focus on one
particular aspect of system performance. Scripts
are used to simulate the interaction behaviors - Kernels small, key pieces from real programs.
Typically used to evaluate individual features of
the machine - Toy benchmarks typically between 10 and 100
lines of code and produce a known result - Synthetic benchmarks artificially created code
to match an average execution profile
30Benchmark Suites
- They are a collection of programs (workload) that
try to explore and capture all the strengths and
weaknesses of a computer system (real programs,
kernels)
- A key advantage of such suites is that the
weakness of any one benchmark is lessened by the
presence of the other benchmarks - Good vs. Bad benchmarks
- Improving product for real programs vs. improving
product for benchmarks to get more sales - If benchmarks are inadequate, then sales wins!
31SPEC Benchmarks
SPEC Standard Performance Evaluation Cooperation
- Most successful attempts and widely adopted
- First generation 1989
- 10 programs yielding a single number
(SPECmarks) - Second generation 1992
- SPECInt92 (6 integer programs) and SPECfp92 (14
floating point programs) - Unlimited compiler flags
- Third generation 1995
- New set of programs SPECint95 (8 integer
programs) and SPECfp95 (10 floating point) - Single flag setting for all programs
SPECint_base95, SPECfp_base95 - benchmarks useful for 3 years
- SPEC CPU 2000
- CINT2000 (11 integer programs) andCFP2000 (14
floating point programs)
32SPEC Benchmarks
CINT2000 (Integer Component of SPEC CPU2000)
- Program Language What Is It
- 164.gzip C Compression
- 175.vpr C FPGA Circuit Placement and Routing
- 176.gcc C C Programming Language Compiler
- 181.mcf C Combinatorial Optimization
- 186.crafty C Game Playing Chess
- 197.parser C Word Processing
- 252.eon C Computer Visualization
- 253.perlbmk C PERL Programming Language
- 254.gap C Group Theory, Interpreter
- 255.vortex C Object-oriented Database
- 256.bzip2 C Compression
- 300.twolf C Place and Route Simulator
http//www.spec.org/osg/cpu2000/CINT2000/
33SPEC Benchmarks
CFP2000 (Floating Point Component of SPEC CPU2000)
- Program Language What Is It
- 168.wupwise Fortran 77 Physics / Quantum
Chromodynamics - 171.swim Fortran 77 Shallow Water Modeling
- 172.mgrid Fortran 77 Multi-grid Solver 3D
Potential Field - 173.applu Fortran 77 Parabolic / Elliptic
Differential Equations - 177.mesa C 3-D Graphics Library
- 178.galgel Fortran 90 Computational Fluid
Dynamics - 179.art C Image Recognition / Neural Networks
- 183.equake C Seismic Wave Propagation Simulation
- 187.facerec Fortran 90 Image Processing Face
Recognition - 188.ammp C Computational Chemistry
- 189.lucas Fortran 90 Number Theory / Primality
Testing - 191.fma3d Fortran 90 Finite-element Crash
Simulation - 200.sixtrack Fortran 77 High Energy Physics
Accelerator Design - 301.apsi Fortran 77 Meteorology Pollutant
Distribution
http//www.spec.org/osg/cpu2000/CFP2000/
34More Benchmarks
TPC Transaction Processing Council
- Measure the ability of a system to handle
transactions, which consist of database accesses
and updates. - Many variants depending on transaction complexity
- TPC-A simple bank teller transaction style
- TPC-C complex database query
EDN Embedded Microprocessor Benchmark Consortium
(EEMBC, embassy)
- 34 kernels in 5 classes
- 16 automotive/industrial 5 consumer 3
networking 4 office automation 6
telecommunications
35How to Summarize Performance
- Management would like to have one number
- Technical people want more
- They want to have evidence of reproducibility
there should be enough information so that you or
someone else can repeat the experiment - There should be consistency when doing the
measurements multiple times
How would you report these results?
36Comparing and Summarizing Performance
- Comparing the performance by looking at
individual programs is not fair - Total execution time a consistent summary
measure - Arithmetic Mean provides a simple average
- Timei execution time for program i in the
workload - Doesnt account for weight all programs are
treated equal
37Weighted Variants
- What is the proper mixture of programs for the
workload? - Weight is a weighting factor to each program to
indicate the relative frequency of the program in
the workload of use - Weighted Arithmetic Mean
- Weighti frequency of program i in the workload
- May be better but beware the dominant program
time
38Example Figure 1.16
Weighted Arithmetic Mean
39Normalized Time Metrics
- Normalized execution time metrics
- Measure the performance by normalizing it to a
reference machine Execution time ratioi - Geometric Mean
- Geometric mean is consistent no matter which
machine is the reference - The arithmetic mean should not be used to average
normalized execution times - However, geometric mean still doesnt form
accurate predication models (doesnt predict
execution time). It encourages designers to
improve the easiest benchmarks rather the slowest
benchmarks (due to no weighting)
40Example
Machines A and B have the same performance
according to the Geometric Mean measure, yet this
would only be true for a workload that P1 runs
100 times more than P2 according to the Weighted
Arithmetic Mean measure
411.6 Quantitative Principles of Computer Design
- Already known how to define, measure and
summarize performance, then we can explore some
of the principles and guidelines in design and
analysis of computers. -
- Make the common case fast
- In making a design trade-off, favor the frequent
case over the infrequent case - Improving the frequent event, rather than the
rare event, will obviously help performance - Frequent case is often simpler and can be done
faster than the infrequent case - We have to decide what the frequent case is and
how much performance can be improved by making
the case faster
42Two equations to evaluate design alternatives
- Amdahls Law states that the performance
improvement to be gained from using some fast
mode of execution is limited by the fraction of
the time the faster mode can be used - Amdahls Law defines the speedup that can be
gained by using a particular feature - The performance gain that can be obtained by
improving some porting of a computer can be
calculated using Amdahls Law
- The CPU Performance Equation
- Essentially all computers are constructed using a
clock running at a constant rate - CPU time then can be expressed by the amount of
clock cycles
43Amdahl's Law
Speedup due to enhancement E
- Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder of
the task is unaffected
This fraction enhanced
- Fractionenhanced the fraction of the execution
time in the original machine that can be
converted to take advantage of the enhancement - Speedupenhanced the improvement gained by the
enhanced execution mode
44Amdahl's Law
Fractionenhanced
ExTimenew ExTimeold x (1 - Fractionenhanced)
Speedupenhanced
1
ExTimeold ExTimenew
Speedupoverall
Fractionenhanced
(1 - Fractionenhanced)
Speedupenhanced
This fraction enhanced
ExTimeold
ExTimenew
45Amdahl's Law
- Example Floating point (FP) instructions are
improved to run faster by a factor of 2 but only
10 of actual instructions are FP. Whats the
overall speedup gained by this improvement?
Answer
- Amdahls Law can serve as a guide to how much an
enhancement will improve performance and how to
distribute resource to improve cost-performance - It is particularly useful for comparing
performances both of the overall system and the
CPU of two design alternatives
More example page 41 and page 42
46CPU Performance
- All computers are constructed using a clock to
operate its circuits - Typically measured by two basic metrics
- Clock rate today in MHz and GHz
- Clock cycle time clock cycle time 1/clock rate
- E.g., 1 GHz rate corresponds to 1 ns cycle time
- Thus the CPU time for a program is given by
- Or,
47CPU Performance
- More typically, we tend to count instructions
executed, known as Instruction Count (IC) - CPI average number of clock cycles per
instruction - Hence, alternative method to get the CPU time
Depends on IS of the computer and its compiler
- CPU performance is equally dependent upon 3
characteristics clock cycle time, CPI, IC. They
are independent and making one better often makes
another worse because the basic technologies
involved in changing each characteristics are
interdependent.
- Clock cycle time hardware technology and
organization - CPI organization and instruction set
architecture - IC instruction set architecture and complier
technology
48CPU Performance
Example Suppose we have 2 implementations of the
same instruction set architecture. Computer A has
a clock cycle time of 10 nsec and a CPI of 2.0
for some program, and computer B has a clock
cycle time of 20 nsec and a CPI of 1.2 for the
same program. What machine is faster for this
program?
Answer Assume the program require IN
instructions to be executed CPU clock cycleA
IN x 2.0 CPU clock cycleB IN x 1.2 CPU timeA
IN x 2.0 x 10 20 x IN nsec CPU timeB IN x
1.2 x 20 24 x IN nsec So computer A is faster
than computer B.
49CPU Performance
- Often the overall performance is easy to deal
with on a per instruction set basis - The overall CPI can be expressed as
times instruction i is executed
CPI for instruction i
50Cycles Per Instruction
Example Suppose we have a machine where we can
count the frequency with which instructions are
executed. We also know how many cycles it takes
for each instruction type.
- Base Machine (Reg / Reg)
- Instruction Freq CPI ( Time)
- ALU 50 1 (33)
- Load 20 2 (27)
- Store 10 2 (13)
- Branch 20 2 (27)
- (Total CPI 1.5)
How do we get total CPI? How do we get time?
51CPU Performance
Example Suppose we have made the following
measurements Frequency of FP operations (other
than FPSQR) 25 Avg. CPI of FP
operations 4.0 Avg. CPI of other
instructions 1.33 Frequency of FPSQR 2
CPI of FPSQR 20 Compare two
designs 1) decrease CPI of FPSQR to 2 2)
decrease the avg. CPI of all FP operations to 2.5.
Answer First, find the original CPI The CPI
with design 1 The CPI with design 2
So design 2 is better
52Principle of Locality
- Other important fundamental observations come
from the properties of programs. - Principle of locality Programs tend to reuse
data and instructions they have used recently. - There are two different types of locality (Ref.
Chapter 5) - Temporal Locality (locality in time) If an item
is referenced, it will tend to be referenced
again soon (loops, reuse, etc.) - Spatial Locality (locality in space/location)
If an item is referenced, items whose addresses
are close one another tend to be referenced soon
(straight line code, array access, etc.) - We can predict with reasonable accuracy what
instructions and data a program will use in the
near future based on its accesses in the past. - A program spends 90 of its execution time in
only 10 of the code.
53Some Misleading Performance Measures
- There are certain computer performance measures
which are famous with computer manufactures and
sellers but may be misleading. - MIPS (Million Instructions Per Second)
- MIPS depends on the instruction set to make it
difficult to compare MIPS of different
instructions. - MIPS varies between programs on the same computer
different programs use different instruction
mix. - MIPS can vary inversely to performance most
importantly.
54Some Misleading Performance Measures
- MFLOPS Focus on one type of work
- MFLOPS (Million Floating-point Operations Per
Second) depends on the program. Must be FP
intensive. - MFLOPS depends on the computer as well.
- The floating-point operations vary in complexity
(e.g., add divide).
- Peak Performance Performance that the
manufacture guarantees you wont exceed - Difference between peak performance and average
performance is huge. - Peak performance is not useful in predicting
observed performance.
55Summary
- 1.1 Introduction
- 1.2 The Changing Face of Computing and the Task
of the Computer Designer - 1.3 Technology Trends
- 1.4 Cost, Price, and Their Trends
- 1.5 Measuring and Reporting Performance
- Benchmarks
- (Weighted) Arithmetic Mean, Normalized Geometric
Mean - 1.6 Quantitative Principles of Computer Design
- Amdahls Law
- CPU Performance CPU Time, CPI
- Locality