Chapter 1 Fundamentals of Computer Design

About This Presentation

Title:

Chapter 1 Fundamentals of Computer Design

Description:

1.2 The Changing Face of Computing and the Task of the Computer Designer. 1.3 Technology Trends ... 186.crafty C Game Playing: Chess. 197.parser C Word Processing ... – PowerPoint PPT presentation

Number of Views:577

Avg rating:3.0/5.0

Slides: 56

Provided by: chunh

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 1 Fundamentals of Computer Design

1
Chapter 1Fundamentals of Computer Design
EEF011 Computer Architecture?????

???
??????????
September 2004

2
Chapter 1. Fundamentals of Computer Design

1.1 Introduction
1.2 The Changing Face of Computing and the Task
of the Computer Designer
1.3 Technology Trends
1.4 Cost, Price, and Their Trends
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together
1.8 Another View
1.9 Fallacies and Pitfalls
1.10 Concluding Remarks

3
Computer Architecture 1.1 Introduction

Rapid rate of improvement in the roughly 55 years
the technology used to build computers
innovation in computer design
Beginning in about 1970,computer designers became
largely dependent upon IC technology
Emergence of microprocessor late 1970s
C virtual elimination of assembly language
programming (portability)
UNIX standardized, vendor-independent OS
Lowered the cost and risk of bringing out a new
architecture

4
1.1 Introduction (cont.)

Development of RISC (Reduced Instruction Set
Computer) early 1980s
Focused the attention of designers on
The exploitation of instruction-level parallelism
Initially through pipelining, and
later through multiple instruction issue
The use of caches
initially in simple forms and
later using more sophisticated organizations and
optimizations

5
Figure 1.1 20 years of sustained growth in
performance at an annual rate of over 50 (two
different versions of SPEC, i.e., SPEC92 and
SPEC95)

Effects
Significantly enhanced the capability available
to users
Dominance of microprocessor-based computers
across the entire range of computer design
Minicomputers and mainframes replaced
Supercomputers built with collections of
microprocessors

6
1.2 Changing Face of Computing

1960s dominated by mainframes
business data processing, large-scale scientific
computing
1970s the birth of minicomputers
scientific laboratories, time-sharing
multiple-user environment
1980s the rise of the desktop computer (personal
computer and workstation)
1990s the emergence of the Internet and the
World Wide Web
handheld computing devices
emergence of high-performance digital consumer
electronics

7
Segmented Markets

Desktop computing to optimize price-performance
the largest market in dollar terms
Web-centric, interactive applications
Servers
Web-based services, replacing traditional
mainframe
Availability, Scalability, Throughput
Embedded Computers
Features
The presence of the computers is not immediately
obvious
The application can usually be carefully tuned
for the processor and system
Widest range of processing power and cost
price is a key factor in the design of computers
Real-time performance requirement
Key characteristics
the need to minimize memory and power
The use of processor cores together with
application-specific circuitry

8
(No Transcript)
9
Task of Computer Designer

Tasks
Determine what attributes are important for a new
machine, then
design a machine to maximize performance while
staying within cost and power constraints
Efforts
instruction set design, functional organization,
logic design, and implementation (IC design,
packaging, power, and cooling)
Design a computer to meet functional requirements
as well as price, power, and performance goals
(see Figure 1.4)

10
Task of a Computer Designer
Effect of trends Designer often needs to design
for the next technology because during a single
product cycle (typically 4-5 years), key
technologies, such as memory, change sufficiently
that the designer must plan for these changes.
11
(No Transcript)
12
1.3 Technology Trends

The designer must be especially aware of rapidly
occurring changes in implementation technology
Integrated circuit logic technology
transistor density increases by about 35 per
year
die size are less predictable and slower
combined effect growth rate in transistor count
about 55 per year
Semiconductor DRAM
density increases by between 40 and 60 per year
cycle time decreased by about one-third in 10
years
bandwidth per chip increases about twice
Magnetic disk technology
Network technology

13
Moores Law

Gordon Moore (Founder of Intel) observed in 1965
that the number of transistors that could be
crammed on a chip doubles every year.
http//www.intel.com/research/silicon/mooreslaw.h
tm

14
Technology Trends

Based on SPEED, the CPU has increased
dramatically, but memory and disk have increased
only a little. This has led to dramatic changed
in architecture, Operating Systems, and
Programming practices.

Capacity Speed (latency) Logic 2x in 3
years 2x in 3 years DRAM 4x in 3 years 2x in
10 years Disk 4x in 3 years 2x in 10 years
15
1.4 Cost, Price, and Their Trends

Price what you sell a finished good
1999 more than half the PCs sold were priced at
less than 1000
Cost the amount spent to produce it, including
overhead

16
Cost and Its Trend

Time
Learning curve manufacturing cost reduces over
time
Yield the percentage of the manufactured devices
that survive the testing procedure
As the technology matures over time, the yield
improves and hence, things get cheaper
Volume
Increasing volume decreases cost (time for
learning curve),
Increases purchasing and manufacturing efficiency
Commodities
Products are sold by multiple vendors in large
volumes and are essentially identical
The low-end business such as DRAMs, disks,
monitors
Improved competition ? reduced cost

17
Prices of six generations of DRAMs
Between the start of a project and the shipping
of a produce, say, two years, the cost of a new
DRAM drops by a factor of between 5 and 10 in
constant dollars.
18
Price of an Intel Pentium III at a Given Frequency
It decreases over time as yield enhancements
decrease the cost of a good die and competition
forces price reductions.
19
Cost and Its Trend (Cont.)

Time
Learning curve manufacturing cost reduces over
time
Yield the percentage of the manufactured devices
that survive the testing procedure
As the technology matures over time, the yield
improves and hence, things get cheaper
Volume
Increasing volume decreases cost (time for
learning curve),
Increases purchasing and manufacturing efficiency
Commodities
Products are sold by multiple vendors in large
volumes and are essentially identical
The low-end business such as DRAMs, disks,
monitors
Improved competition ? reduced cost

20
Wafer and Dies
Exponential cost decrease technology basically
the same A wafer is tested and chopped into dies
that are packaged
Figure 1.8. This wafer contains 564 MIPS64 R20K
processors in 0.18? Figure 1.7. Intel Pentium 4
microprocessor Die
21
Cost of an Integrated Circuit (IC)

Cost of IC

(A greater portion of the cost that varies
between machines)
(sensitive to die size)
of dies along the edge
Todays technology ? ? 4, detect density 0.4 and
0.8 per cm2
22
Cost of an IC (cont.)
23
Cost Example in a 1000 PC in 2001
24
1.5 Measuring and Reporting Performance

Designing high performance computers is one of
the major goals of any computer architect.
As a result, assessing the performance of
computer hardware is the at the heart of computer
design and greatly affects the demand and market
value of the computer.
However, measuring performance of a computer
system is not a straightforward task
Metrics How do we describe in a numerical way
the performance of a computer?
What tools do we use to find those metrics?
How do we summarize the performance?

25
What is the computer user interested in?

Reduce the time to run certain task
Execution time (response time)
The time between the start and the completion of
an event
Increase the tasks per week, day, hour, sec, ns
Throughput
The total amount of work done in a given time

26
Example

Do the following changes to a computer system
increase throughput, reduce response time, or
both?
Replacing the processor in a computer with a
faster version
Adding additional processors to a system that
uses multiple processors for separate tasks for
example handling an airline reservation system

Answer
Both response time and throughput are improved
Only throughput increases

27
Execution Time and Performance Comparison

In this subject, we will be primarily interested
in execution time as a measure of performance
The relationship between performance and
execution time on a computer X (reciprocal) is
defined as follows
To compare design alternatives, we use the
following equation
X is n times faster than Y or the throughput
of X is n times higher than Y means that the
execution time is n times less on X than Y
To maximize performance of an application, we
need to minimize its execution time

28
Measure Performance user CPU time

Response time may include disk access, memory
access, input/output activities, CPU event and
operating system overhead - everything
In order to get an accurate measure of
performance, we use CPU time instead of using
response time
CPU time is the time the CPU spends computing a
program and does not include time spent waiting
for I/O or running other programs
CPU time can also be divided into user CPU time
(program) and system CPU time (OS)
Key in UNIX command time, we have,
90.7u 12.9s 2.39 65 (user CPU, system CPU,
total response,)
In our performance measures, we use user CPU time
because of its independence on the OS and other
factors

29
Programs for Measuring Performance

Real applications text processing software
(Word), compliers (C), and other applications
like Photoshop have inputs, outputs, and
options when the user wants to use them
One major downside Real applications often
encounter portability problems arising from
dependences on OS or complier
Modified (or scripted) applications Modification
is to enhance portability or focus on one
particular aspect of system performance. Scripts
are used to simulate the interaction behaviors
Kernels small, key pieces from real programs.
Typically used to evaluate individual features of
the machine
Toy benchmarks typically between 10 and 100
lines of code and produce a known result
Synthetic benchmarks artificially created code
to match an average execution profile

30
Benchmark Suites

They are a collection of programs (workload) that
try to explore and capture all the strengths and
weaknesses of a computer system (real programs,
kernels)

A key advantage of such suites is that the
weakness of any one benchmark is lessened by the
presence of the other benchmarks
Good vs. Bad benchmarks
Improving product for real programs vs. improving
product for benchmarks to get more sales
If benchmarks are inadequate, then sales wins!

31
SPEC Benchmarks
SPEC Standard Performance Evaluation Cooperation

Most successful attempts and widely adopted
First generation 1989
10 programs yielding a single number
(SPECmarks)
Second generation 1992
SPECInt92 (6 integer programs) and SPECfp92 (14
floating point programs)
Unlimited compiler flags
Third generation 1995
New set of programs SPECint95 (8 integer
programs) and SPECfp95 (10 floating point)
Single flag setting for all programs
SPECint_base95, SPECfp_base95
benchmarks useful for 3 years
SPEC CPU 2000
CINT2000 (11 integer programs) andCFP2000 (14
floating point programs)

32
SPEC Benchmarks
CINT2000 (Integer Component of SPEC CPU2000)

Program Language What Is It
164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing Chess
197.parser C Word Processing
252.eon C Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2 C Compression
300.twolf C Place and Route Simulator

http//www.spec.org/osg/cpu2000/CINT2000/
33
SPEC Benchmarks
CFP2000 (Floating Point Component of SPEC CPU2000)

Program Language What Is It
168.wupwise Fortran 77 Physics / Quantum
Chromodynamics
171.swim Fortran 77 Shallow Water Modeling
172.mgrid Fortran 77 Multi-grid Solver 3D
Potential Field
173.applu Fortran 77 Parabolic / Elliptic
Differential Equations
177.mesa C 3-D Graphics Library
178.galgel Fortran 90 Computational Fluid
Dynamics
179.art C Image Recognition / Neural Networks
183.equake C Seismic Wave Propagation Simulation
187.facerec Fortran 90 Image Processing Face
Recognition
188.ammp C Computational Chemistry
189.lucas Fortran 90 Number Theory / Primality
Testing
191.fma3d Fortran 90 Finite-element Crash
Simulation
200.sixtrack Fortran 77 High Energy Physics
Accelerator Design
301.apsi Fortran 77 Meteorology Pollutant
Distribution

http//www.spec.org/osg/cpu2000/CFP2000/
34
More Benchmarks
TPC Transaction Processing Council

Measure the ability of a system to handle
transactions, which consist of database accesses
and updates.
Many variants depending on transaction complexity
TPC-A simple bank teller transaction style
TPC-C complex database query

EDN Embedded Microprocessor Benchmark Consortium
(EEMBC, embassy)

34 kernels in 5 classes
16 automotive/industrial 5 consumer 3
networking 4 office automation 6
telecommunications

35
How to Summarize Performance

Management would like to have one number
Technical people want more
They want to have evidence of reproducibility
there should be enough information so that you or
someone else can repeat the experiment
There should be consistency when doing the
measurements multiple times

How would you report these results?
36
Comparing and Summarizing Performance

Comparing the performance by looking at
individual programs is not fair
Total execution time a consistent summary
measure
Arithmetic Mean provides a simple average
Timei execution time for program i in the
workload
Doesnt account for weight all programs are
treated equal

37
Weighted Variants

What is the proper mixture of programs for the
workload?
Weight is a weighting factor to each program to
indicate the relative frequency of the program in
the workload of use
Weighted Arithmetic Mean
Weighti frequency of program i in the workload
May be better but beware the dominant program
time

38
Example Figure 1.16
Weighted Arithmetic Mean
39
Normalized Time Metrics

Normalized execution time metrics
Measure the performance by normalizing it to a
reference machine Execution time ratioi
Geometric Mean
Geometric mean is consistent no matter which
machine is the reference
The arithmetic mean should not be used to average
normalized execution times
However, geometric mean still doesnt form
accurate predication models (doesnt predict
execution time). It encourages designers to
improve the easiest benchmarks rather the slowest
benchmarks (due to no weighting)

40
Example
Machines A and B have the same performance
according to the Geometric Mean measure, yet this
would only be true for a workload that P1 runs
100 times more than P2 according to the Weighted
Arithmetic Mean measure
41
1.6 Quantitative Principles of Computer Design

Already known how to define, measure and
summarize performance, then we can explore some
of the principles and guidelines in design and
analysis of computers.
Make the common case fast
In making a design trade-off, favor the frequent
case over the infrequent case
Improving the frequent event, rather than the
rare event, will obviously help performance
Frequent case is often simpler and can be done
faster than the infrequent case
We have to decide what the frequent case is and
how much performance can be improved by making
the case faster

42
Two equations to evaluate design alternatives

Amdahls Law

Amdahls Law states that the performance
improvement to be gained from using some fast
mode of execution is limited by the fraction of
the time the faster mode can be used
Amdahls Law defines the speedup that can be
gained by using a particular feature
The performance gain that can be obtained by
improving some porting of a computer can be
calculated using Amdahls Law

The CPU Performance Equation

Essentially all computers are constructed using a
clock running at a constant rate
CPU time then can be expressed by the amount of
clock cycles

43
Amdahl's Law
Speedup due to enhancement E

Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder of
the task is unaffected

This fraction enhanced

Fractionenhanced the fraction of the execution
time in the original machine that can be
converted to take advantage of the enhancement
Speedupenhanced the improvement gained by the
enhanced execution mode

44
Amdahl's Law
Fractionenhanced
ExTimenew ExTimeold x (1 - Fractionenhanced)
Speedupenhanced
1
ExTimeold ExTimenew
Speedupoverall

Fractionenhanced
(1 - Fractionenhanced)
Speedupenhanced
This fraction enhanced
ExTimeold
ExTimenew
45
Amdahl's Law

Example Floating point (FP) instructions are
improved to run faster by a factor of 2 but only
10 of actual instructions are FP. Whats the
overall speedup gained by this improvement?

Answer

Amdahls Law can serve as a guide to how much an
enhancement will improve performance and how to
distribute resource to improve cost-performance
It is particularly useful for comparing
performances both of the overall system and the
CPU of two design alternatives

More example page 41 and page 42
46
CPU Performance

All computers are constructed using a clock to
operate its circuits
Typically measured by two basic metrics
Clock rate today in MHz and GHz
Clock cycle time clock cycle time 1/clock rate
E.g., 1 GHz rate corresponds to 1 ns cycle time
Thus the CPU time for a program is given by
Or,

47
CPU Performance

More typically, we tend to count instructions
executed, known as Instruction Count (IC)
CPI average number of clock cycles per
instruction
Hence, alternative method to get the CPU time

Depends on IS of the computer and its compiler

CPU performance is equally dependent upon 3
characteristics clock cycle time, CPI, IC. They
are independent and making one better often makes
another worse because the basic technologies
involved in changing each characteristics are
interdependent.

Clock cycle time hardware technology and
organization
CPI organization and instruction set
architecture
IC instruction set architecture and complier
technology

48
CPU Performance
Example Suppose we have 2 implementations of the
same instruction set architecture. Computer A has
a clock cycle time of 10 nsec and a CPI of 2.0
for some program, and computer B has a clock
cycle time of 20 nsec and a CPI of 1.2 for the
same program. What machine is faster for this
program?
Answer Assume the program require IN
instructions to be executed CPU clock cycleA
IN x 2.0 CPU clock cycleB IN x 1.2 CPU timeA
IN x 2.0 x 10 20 x IN nsec CPU timeB IN x
1.2 x 20 24 x IN nsec So computer A is faster
than computer B.
49
CPU Performance

Often the overall performance is easy to deal
with on a per instruction set basis
The overall CPI can be expressed as

times instruction i is executed
CPI for instruction i
50
Cycles Per Instruction
Example Suppose we have a machine where we can
count the frequency with which instructions are
executed. We also know how many cycles it takes
for each instruction type.

Base Machine (Reg / Reg)
Instruction Freq CPI ( Time)
ALU 50 1 (33)
Load 20 2 (27)
Store 10 2 (13)
Branch 20 2 (27)
(Total CPI 1.5)

How do we get total CPI? How do we get time?
51
CPU Performance
Example Suppose we have made the following
measurements Frequency of FP operations (other
than FPSQR) 25 Avg. CPI of FP
operations 4.0 Avg. CPI of other
instructions 1.33 Frequency of FPSQR 2
CPI of FPSQR 20 Compare two
designs 1) decrease CPI of FPSQR to 2 2)
decrease the avg. CPI of all FP operations to 2.5.
Answer First, find the original CPI The CPI
with design 1 The CPI with design 2
So design 2 is better
52
Principle of Locality

Other important fundamental observations come
from the properties of programs.
Principle of locality Programs tend to reuse
data and instructions they have used recently.
There are two different types of locality (Ref.
Chapter 5)
Temporal Locality (locality in time) If an item
is referenced, it will tend to be referenced
again soon (loops, reuse, etc.)
Spatial Locality (locality in space/location)
If an item is referenced, items whose addresses
are close one another tend to be referenced soon
(straight line code, array access, etc.)
We can predict with reasonable accuracy what
instructions and data a program will use in the
near future based on its accesses in the past.
A program spends 90 of its execution time in
only 10 of the code.

53
Some Misleading Performance Measures

There are certain computer performance measures
which are famous with computer manufactures and
sellers but may be misleading.
MIPS (Million Instructions Per Second)
MIPS depends on the instruction set to make it
difficult to compare MIPS of different
instructions.
MIPS varies between programs on the same computer
different programs use different instruction
mix.
MIPS can vary inversely to performance most
importantly.

54
Some Misleading Performance Measures

MFLOPS Focus on one type of work
MFLOPS (Million Floating-point Operations Per
Second) depends on the program. Must be FP
intensive.
MFLOPS depends on the computer as well.
The floating-point operations vary in complexity
(e.g., add divide).