Chapter 1 Fundamentals of Computer Design - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Chapter 1 Fundamentals of Computer Design

Description:

1.2 The Changing Face of Computing and the Task of the Computer Designer. 1.3 Technology Trends ... 186.crafty C Game Playing: Chess. 197.parser C Word Processing ... – PowerPoint PPT presentation

Number of Views:576
Avg rating:3.0/5.0
Slides: 56
Provided by: chunh
Category:

less

Transcript and Presenter's Notes

Title: Chapter 1 Fundamentals of Computer Design


1
Chapter 1Fundamentals of Computer Design
EEF011 Computer Architecture?????
  • ???
  • ??????????
  • September 2004

2
Chapter 1. Fundamentals of Computer Design
  • 1.1 Introduction
  • 1.2 The Changing Face of Computing and the Task
    of the Computer Designer
  • 1.3 Technology Trends
  • 1.4 Cost, Price, and Their Trends
  • 1.5 Measuring and Reporting Performance
  • 1.6 Quantitative Principles of Computer Design
  • 1.7 Putting It All Together
  • 1.8 Another View
  • 1.9 Fallacies and Pitfalls
  • 1.10 Concluding Remarks

3
Computer Architecture 1.1 Introduction
  • Rapid rate of improvement in the roughly 55 years
  • the technology used to build computers
  • innovation in computer design
  • Beginning in about 1970,computer designers became
    largely dependent upon IC technology
  • Emergence of microprocessor late 1970s
  • C virtual elimination of assembly language
    programming (portability)
  • UNIX standardized, vendor-independent OS
  • Lowered the cost and risk of bringing out a new
    architecture

4
1.1 Introduction (cont.)
  • Development of RISC (Reduced Instruction Set
    Computer) early 1980s
  • Focused the attention of designers on
  • The exploitation of instruction-level parallelism
  • Initially through pipelining, and
  • later through multiple instruction issue
  • The use of caches
  • initially in simple forms and
  • later using more sophisticated organizations and
    optimizations

5
Figure 1.1 20 years of sustained growth in
performance at an annual rate of over 50 (two
different versions of SPEC, i.e., SPEC92 and
SPEC95)
  • Effects
  • Significantly enhanced the capability available
    to users
  • Dominance of microprocessor-based computers
    across the entire range of computer design
  • Minicomputers and mainframes replaced
  • Supercomputers built with collections of
    microprocessors

6
1.2 Changing Face of Computing
  • 1960s dominated by mainframes
  • business data processing, large-scale scientific
    computing
  • 1970s the birth of minicomputers
  • scientific laboratories, time-sharing
    multiple-user environment
  • 1980s the rise of the desktop computer (personal
    computer and workstation)
  • 1990s the emergence of the Internet and the
    World Wide Web
  • handheld computing devices
  • emergence of high-performance digital consumer
    electronics

7
Segmented Markets
  • Desktop computing to optimize price-performance
  • the largest market in dollar terms
  • Web-centric, interactive applications
  • Servers
  • Web-based services, replacing traditional
    mainframe
  • Availability, Scalability, Throughput
  • Embedded Computers
  • Features
  • The presence of the computers is not immediately
    obvious
  • The application can usually be carefully tuned
    for the processor and system
  • Widest range of processing power and cost
  • price is a key factor in the design of computers
  • Real-time performance requirement
  • Key characteristics
  • the need to minimize memory and power
  • The use of processor cores together with
    application-specific circuitry

8
(No Transcript)
9
Task of Computer Designer
  • Tasks
  • Determine what attributes are important for a new
    machine, then
  • design a machine to maximize performance while
    staying within cost and power constraints
  • Efforts
  • instruction set design, functional organization,
    logic design, and implementation (IC design,
    packaging, power, and cooling)
  • Design a computer to meet functional requirements
    as well as price, power, and performance goals
  • (see Figure 1.4)

10
Task of a Computer Designer
Effect of trends Designer often needs to design
for the next technology because during a single
product cycle (typically 4-5 years), key
technologies, such as memory, change sufficiently
that the designer must plan for these changes.
11
(No Transcript)
12
1.3 Technology Trends
  • The designer must be especially aware of rapidly
    occurring changes in implementation technology
  • Integrated circuit logic technology
  • transistor density increases by about 35 per
    year
  • die size are less predictable and slower
  • combined effect growth rate in transistor count
    about 55 per year
  • Semiconductor DRAM
  • density increases by between 40 and 60 per year
  • cycle time decreased by about one-third in 10
    years
  • bandwidth per chip increases about twice
  • Magnetic disk technology
  • Network technology

13
Moores Law
  • Gordon Moore (Founder of Intel) observed in 1965
    that the number of transistors that could be
    crammed on a chip doubles every year.
  • http//www.intel.com/research/silicon/mooreslaw.h
    tm

14
Technology Trends
  • Based on SPEED, the CPU has increased
    dramatically, but memory and disk have increased
    only a little. This has led to dramatic changed
    in architecture, Operating Systems, and
    Programming practices.

Capacity Speed (latency) Logic 2x in 3
years 2x in 3 years DRAM 4x in 3 years 2x in
10 years Disk 4x in 3 years 2x in 10 years
15
1.4 Cost, Price, and Their Trends
  • Price what you sell a finished good
  • 1999 more than half the PCs sold were priced at
    less than 1000
  • Cost the amount spent to produce it, including
    overhead

16
Cost and Its Trend
  • Time
  • Learning curve manufacturing cost reduces over
    time
  • Yield the percentage of the manufactured devices
    that survive the testing procedure
  • As the technology matures over time, the yield
    improves and hence, things get cheaper
  • Volume
  • Increasing volume decreases cost (time for
    learning curve),
  • Increases purchasing and manufacturing efficiency
  • Commodities
  • Products are sold by multiple vendors in large
    volumes and are essentially identical
  • The low-end business such as DRAMs, disks,
    monitors
  • Improved competition ? reduced cost

17
Prices of six generations of DRAMs
Between the start of a project and the shipping
of a produce, say, two years, the cost of a new
DRAM drops by a factor of between 5 and 10 in
constant dollars.
18
Price of an Intel Pentium III at a Given Frequency
It decreases over time as yield enhancements
decrease the cost of a good die and competition
forces price reductions.
19
Cost and Its Trend (Cont.)
  • Time
  • Learning curve manufacturing cost reduces over
    time
  • Yield the percentage of the manufactured devices
    that survive the testing procedure
  • As the technology matures over time, the yield
    improves and hence, things get cheaper
  • Volume
  • Increasing volume decreases cost (time for
    learning curve),
  • Increases purchasing and manufacturing efficiency
  • Commodities
  • Products are sold by multiple vendors in large
    volumes and are essentially identical
  • The low-end business such as DRAMs, disks,
    monitors
  • Improved competition ? reduced cost

20
Wafer and Dies
Exponential cost decrease technology basically
the same A wafer is tested and chopped into dies
that are packaged
Figure 1.8. This wafer contains 564 MIPS64 R20K
processors in 0.18? Figure 1.7. Intel Pentium 4
microprocessor Die
21
Cost of an Integrated Circuit (IC)
  • Cost of IC

(A greater portion of the cost that varies
between machines)
(sensitive to die size)
of dies along the edge
Todays technology ? ? 4, detect density 0.4 and
0.8 per cm2
22
Cost of an IC (cont.)
23
Cost Example in a 1000 PC in 2001
24
1.5 Measuring and Reporting Performance
  • Designing high performance computers is one of
    the major goals of any computer architect.
  • As a result, assessing the performance of
    computer hardware is the at the heart of computer
    design and greatly affects the demand and market
    value of the computer.
  • However, measuring performance of a computer
    system is not a straightforward task
  • Metrics How do we describe in a numerical way
    the performance of a computer?
  • What tools do we use to find those metrics?
  • How do we summarize the performance?

25
What is the computer user interested in?
  • Reduce the time to run certain task
  • Execution time (response time)
  • The time between the start and the completion of
    an event
  • Increase the tasks per week, day, hour, sec, ns
  • Throughput
  • The total amount of work done in a given time

26
Example
  • Do the following changes to a computer system
    increase throughput, reduce response time, or
    both?
  • Replacing the processor in a computer with a
    faster version
  • Adding additional processors to a system that
    uses multiple processors for separate tasks for
    example handling an airline reservation system
  • Answer
  • Both response time and throughput are improved
  • Only throughput increases

27
Execution Time and Performance Comparison
  • In this subject, we will be primarily interested
    in execution time as a measure of performance
  • The relationship between performance and
    execution time on a computer X (reciprocal) is
    defined as follows
  • To compare design alternatives, we use the
    following equation
  • X is n times faster than Y or the throughput
    of X is n times higher than Y means that the
    execution time is n times less on X than Y
  • To maximize performance of an application, we
    need to minimize its execution time

28
Measure Performance user CPU time
  • Response time may include disk access, memory
    access, input/output activities, CPU event and
    operating system overhead - everything
  • In order to get an accurate measure of
    performance, we use CPU time instead of using
    response time
  • CPU time is the time the CPU spends computing a
    program and does not include time spent waiting
    for I/O or running other programs
  • CPU time can also be divided into user CPU time
    (program) and system CPU time (OS)
  • Key in UNIX command time, we have,
  • 90.7u 12.9s 2.39 65 (user CPU, system CPU,
    total response,)
  • In our performance measures, we use user CPU time
    because of its independence on the OS and other
    factors

29
Programs for Measuring Performance
  • Real applications text processing software
    (Word), compliers (C), and other applications
    like Photoshop have inputs, outputs, and
    options when the user wants to use them
  • One major downside Real applications often
    encounter portability problems arising from
    dependences on OS or complier
  • Modified (or scripted) applications Modification
    is to enhance portability or focus on one
    particular aspect of system performance. Scripts
    are used to simulate the interaction behaviors
  • Kernels small, key pieces from real programs.
    Typically used to evaluate individual features of
    the machine
  • Toy benchmarks typically between 10 and 100
    lines of code and produce a known result
  • Synthetic benchmarks artificially created code
    to match an average execution profile

30
Benchmark Suites
  • They are a collection of programs (workload) that
    try to explore and capture all the strengths and
    weaknesses of a computer system (real programs,
    kernels)
  • A key advantage of such suites is that the
    weakness of any one benchmark is lessened by the
    presence of the other benchmarks
  • Good vs. Bad benchmarks
  • Improving product for real programs vs. improving
    product for benchmarks to get more sales
  • If benchmarks are inadequate, then sales wins!

31
SPEC Benchmarks
SPEC Standard Performance Evaluation Cooperation
  • Most successful attempts and widely adopted
  • First generation 1989
  • 10 programs yielding a single number
    (SPECmarks)
  • Second generation 1992
  • SPECInt92 (6 integer programs) and SPECfp92 (14
    floating point programs)
  • Unlimited compiler flags
  • Third generation 1995
  • New set of programs SPECint95 (8 integer
    programs) and SPECfp95 (10 floating point)
  • Single flag setting for all programs
    SPECint_base95, SPECfp_base95
  • benchmarks useful for 3 years
  • SPEC CPU 2000
  • CINT2000 (11 integer programs) andCFP2000 (14
    floating point programs)

32
SPEC Benchmarks
CINT2000 (Integer Component of SPEC CPU2000)
  • Program Language What Is It
  • 164.gzip C Compression
  • 175.vpr C FPGA Circuit Placement and Routing
  • 176.gcc C C Programming Language Compiler
  • 181.mcf C Combinatorial Optimization
  • 186.crafty C Game Playing Chess
  • 197.parser C Word Processing
  • 252.eon C Computer Visualization
  • 253.perlbmk C PERL Programming Language
  • 254.gap C Group Theory, Interpreter
  • 255.vortex C Object-oriented Database
  • 256.bzip2 C Compression
  • 300.twolf C Place and Route Simulator

http//www.spec.org/osg/cpu2000/CINT2000/
33
SPEC Benchmarks
CFP2000 (Floating Point Component of SPEC CPU2000)
  • Program Language What Is It
  • 168.wupwise Fortran 77 Physics / Quantum
    Chromodynamics
  • 171.swim Fortran 77 Shallow Water Modeling
  • 172.mgrid Fortran 77 Multi-grid Solver 3D
    Potential Field
  • 173.applu Fortran 77 Parabolic / Elliptic
    Differential Equations
  • 177.mesa C 3-D Graphics Library
  • 178.galgel Fortran 90 Computational Fluid
    Dynamics
  • 179.art C Image Recognition / Neural Networks
  • 183.equake C Seismic Wave Propagation Simulation
  • 187.facerec Fortran 90 Image Processing Face
    Recognition
  • 188.ammp C Computational Chemistry
  • 189.lucas Fortran 90 Number Theory / Primality
    Testing
  • 191.fma3d Fortran 90 Finite-element Crash
    Simulation
  • 200.sixtrack Fortran 77 High Energy Physics
    Accelerator Design
  • 301.apsi Fortran 77 Meteorology Pollutant
    Distribution

http//www.spec.org/osg/cpu2000/CFP2000/
34
More Benchmarks
TPC Transaction Processing Council
  • Measure the ability of a system to handle
    transactions, which consist of database accesses
    and updates.
  • Many variants depending on transaction complexity
  • TPC-A simple bank teller transaction style
  • TPC-C complex database query

EDN Embedded Microprocessor Benchmark Consortium
(EEMBC, embassy)
  • 34 kernels in 5 classes
  • 16 automotive/industrial 5 consumer 3
    networking 4 office automation 6
    telecommunications

35
How to Summarize Performance
  • Management would like to have one number
  • Technical people want more
  • They want to have evidence of reproducibility
    there should be enough information so that you or
    someone else can repeat the experiment
  • There should be consistency when doing the
    measurements multiple times

How would you report these results?
36
Comparing and Summarizing Performance
  • Comparing the performance by looking at
    individual programs is not fair
  • Total execution time a consistent summary
    measure
  • Arithmetic Mean provides a simple average
  • Timei execution time for program i in the
    workload
  • Doesnt account for weight all programs are
    treated equal

37
Weighted Variants
  • What is the proper mixture of programs for the
    workload?
  • Weight is a weighting factor to each program to
    indicate the relative frequency of the program in
    the workload of use
  • Weighted Arithmetic Mean
  • Weighti frequency of program i in the workload
  • May be better but beware the dominant program
    time

38
Example Figure 1.16
Weighted Arithmetic Mean
39
Normalized Time Metrics
  • Normalized execution time metrics
  • Measure the performance by normalizing it to a
    reference machine Execution time ratioi
  • Geometric Mean
  • Geometric mean is consistent no matter which
    machine is the reference
  • The arithmetic mean should not be used to average
    normalized execution times
  • However, geometric mean still doesnt form
    accurate predication models (doesnt predict
    execution time). It encourages designers to
    improve the easiest benchmarks rather the slowest
    benchmarks (due to no weighting)

40
Example
Machines A and B have the same performance
according to the Geometric Mean measure, yet this
would only be true for a workload that P1 runs
100 times more than P2 according to the Weighted
Arithmetic Mean measure
41
1.6 Quantitative Principles of Computer Design
  • Already known how to define, measure and
    summarize performance, then we can explore some
    of the principles and guidelines in design and
    analysis of computers.
  • Make the common case fast
  • In making a design trade-off, favor the frequent
    case over the infrequent case
  • Improving the frequent event, rather than the
    rare event, will obviously help performance
  • Frequent case is often simpler and can be done
    faster than the infrequent case
  • We have to decide what the frequent case is and
    how much performance can be improved by making
    the case faster

42
Two equations to evaluate design alternatives
  • Amdahls Law
  • Amdahls Law states that the performance
    improvement to be gained from using some fast
    mode of execution is limited by the fraction of
    the time the faster mode can be used
  • Amdahls Law defines the speedup that can be
    gained by using a particular feature
  • The performance gain that can be obtained by
    improving some porting of a computer can be
    calculated using Amdahls Law
  • The CPU Performance Equation
  • Essentially all computers are constructed using a
    clock running at a constant rate
  • CPU time then can be expressed by the amount of
    clock cycles

43
Amdahl's Law
Speedup due to enhancement E
  • Suppose that enhancement E accelerates a fraction
    F of the task by a factor S, and the remainder of
    the task is unaffected

This fraction enhanced
  • Fractionenhanced the fraction of the execution
    time in the original machine that can be
    converted to take advantage of the enhancement
  • Speedupenhanced the improvement gained by the
    enhanced execution mode

44
Amdahl's Law
Fractionenhanced
ExTimenew ExTimeold x (1 - Fractionenhanced)
Speedupenhanced
1
ExTimeold ExTimenew
Speedupoverall

Fractionenhanced
(1 - Fractionenhanced)
Speedupenhanced
This fraction enhanced
ExTimeold
ExTimenew
45
Amdahl's Law
  • Example Floating point (FP) instructions are
    improved to run faster by a factor of 2 but only
    10 of actual instructions are FP. Whats the
    overall speedup gained by this improvement?

Answer
  • Amdahls Law can serve as a guide to how much an
    enhancement will improve performance and how to
    distribute resource to improve cost-performance
  • It is particularly useful for comparing
    performances both of the overall system and the
    CPU of two design alternatives

More example page 41 and page 42
46
CPU Performance
  • All computers are constructed using a clock to
    operate its circuits
  • Typically measured by two basic metrics
  • Clock rate today in MHz and GHz
  • Clock cycle time clock cycle time 1/clock rate
  • E.g., 1 GHz rate corresponds to 1 ns cycle time
  • Thus the CPU time for a program is given by
  • Or,

47
CPU Performance
  • More typically, we tend to count instructions
    executed, known as Instruction Count (IC)
  • CPI average number of clock cycles per
    instruction
  • Hence, alternative method to get the CPU time

Depends on IS of the computer and its compiler
  • CPU performance is equally dependent upon 3
    characteristics clock cycle time, CPI, IC. They
    are independent and making one better often makes
    another worse because the basic technologies
    involved in changing each characteristics are
    interdependent.
  • Clock cycle time hardware technology and
    organization
  • CPI organization and instruction set
    architecture
  • IC instruction set architecture and complier
    technology

48
CPU Performance
Example Suppose we have 2 implementations of the
same instruction set architecture. Computer A has
a clock cycle time of 10 nsec and a CPI of 2.0
for some program, and computer B has a clock
cycle time of 20 nsec and a CPI of 1.2 for the
same program. What machine is faster for this
program?
Answer Assume the program require IN
instructions to be executed CPU clock cycleA
IN x 2.0 CPU clock cycleB IN x 1.2 CPU timeA
IN x 2.0 x 10 20 x IN nsec CPU timeB IN x
1.2 x 20 24 x IN nsec So computer A is faster
than computer B.
49
CPU Performance
  • Often the overall performance is easy to deal
    with on a per instruction set basis
  • The overall CPI can be expressed as

times instruction i is executed
CPI for instruction i
50
Cycles Per Instruction
Example Suppose we have a machine where we can
count the frequency with which instructions are
executed. We also know how many cycles it takes
for each instruction type.
  • Base Machine (Reg / Reg)
  • Instruction Freq CPI ( Time)
  • ALU 50 1 (33)
  • Load 20 2 (27)
  • Store 10 2 (13)
  • Branch 20 2 (27)
  • (Total CPI 1.5)

How do we get total CPI? How do we get time?
51
CPU Performance
Example Suppose we have made the following
measurements Frequency of FP operations (other
than FPSQR) 25 Avg. CPI of FP
operations 4.0 Avg. CPI of other
instructions 1.33 Frequency of FPSQR 2
CPI of FPSQR 20 Compare two
designs 1) decrease CPI of FPSQR to 2 2)
decrease the avg. CPI of all FP operations to 2.5.
Answer First, find the original CPI The CPI
with design 1 The CPI with design 2
So design 2 is better
52
Principle of Locality
  • Other important fundamental observations come
    from the properties of programs.
  • Principle of locality Programs tend to reuse
    data and instructions they have used recently.
  • There are two different types of locality (Ref.
    Chapter 5)
  • Temporal Locality (locality in time) If an item
    is referenced, it will tend to be referenced
    again soon (loops, reuse, etc.)
  • Spatial Locality (locality in space/location)
    If an item is referenced, items whose addresses
    are close one another tend to be referenced soon
    (straight line code, array access, etc.)
  • We can predict with reasonable accuracy what
    instructions and data a program will use in the
    near future based on its accesses in the past.
  • A program spends 90 of its execution time in
    only 10 of the code.

53
Some Misleading Performance Measures
  • There are certain computer performance measures
    which are famous with computer manufactures and
    sellers but may be misleading.
  • MIPS (Million Instructions Per Second)
  • MIPS depends on the instruction set to make it
    difficult to compare MIPS of different
    instructions.
  • MIPS varies between programs on the same computer
    different programs use different instruction
    mix.
  • MIPS can vary inversely to performance most
    importantly.

54
Some Misleading Performance Measures
  • MFLOPS Focus on one type of work
  • MFLOPS (Million Floating-point Operations Per
    Second) depends on the program. Must be FP
    intensive.
  • MFLOPS depends on the computer as well.
  • The floating-point operations vary in complexity
    (e.g., add divide).
  • Peak Performance Performance that the
    manufacture guarantees you wont exceed
  • Difference between peak performance and average
    performance is huge.
  • Peak performance is not useful in predicting
    observed performance.

55
Summary
  • 1.1 Introduction
  • 1.2 The Changing Face of Computing and the Task
    of the Computer Designer
  • 1.3 Technology Trends
  • 1.4 Cost, Price, and Their Trends
  • 1.5 Measuring and Reporting Performance
  • Benchmarks
  • (Weighted) Arithmetic Mean, Normalized Geometric
    Mean
  • 1.6 Quantitative Principles of Computer Design
  • Amdahls Law
  • CPU Performance CPU Time, CPI
  • Locality
Write a Comment
User Comments (0)
About PowerShow.com