Computer Architecture - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

Computer Architecture

Description:

I Background and Motivation. Chapter 4 Computer Performance. Chapter 3 ... Playstation 2s, each $100 from flea market, 30W, processing 30 transactions/second ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 86
Provided by: behrooz3
Learn more at: https://eng.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Computer Architecture


1
ComputerArchitecture
  • EEL 4713/5764, Spring 2006
  • Dr. Michael Frank
  • Module 5 Computer Performance

2
Part IBackground and Motivation
3
I Background and Motivation
  • Provide motivation, paint the big picture,
    introduce tools
  • Review components used in building digital
    circuits
  • Present an overview of computer technology
  • Understand the meaning of computer performance
  • (or why a 2 GHz processor isnt 2? as fast as
    a 1 GHz model)

Topics in This Part
Chapter 1 Combinational Digital Circuits
Chapter 2 Digital Circuits with Memory
Chapter 3 Computer System Technology
Chapter 4 Computer Performance
4
4 Computer Performance
  • Performance is key in design decisions also cost
    and power
  • It has been a driving force for innovation
  • Isnt quite the same as speed (higher clock
    rate)

Topics in This Chapter
4.1 Cost, Performance, and Cost/Performance
4.2 Defining Computer Performance
4.3 Performance Enhancement and Amdahls Law
4.4 Performance Measurement vs Modeling
4.5 Reporting Computer Performance
4.6 The Quest for Higher Performance
5
Course Instructional Objective 1
  • As the syllabus says
  • At the completion of this course, students should
    be able to
  • CIO 1. (Metrics) Calculate and interpret
    different performance and cost metrics of
    computer systems.
  • This CIO should also support the following
    Program Outcome
  • Students graduating from the BSEE and BSCpE
    degree programs will have
  • PO (a). (Apply) An ability to apply knowledge of
    mathematics, science and engineering
  • PO (e). (Solve) An ability to identify,
    formulate, and solve engineering problems
  • PO (o). (Topics) EE A knowledge of electrical
    engineering applications selected from the
    digital systems areas CpE A knowledge of
    computer science and computer engineering topics
    including computer architecture.
  • Under assessment instruments, the syllabus
    says
  • 1. Metrics. Students will solve exam problems in
    which they must analyze descriptions of
    hypothetical processors to determine their
    performance, cost-performance, and
    power-performance.

6
Module Instructional Objectives
  • I break down the CIO as follows
  • CIO 1. Metrics (aeo). Calculate and interpret
    different performance and cost metrics of
    computer systems.
  • 1.1. Know apply (a) the definitions of clock
    frequency, MIPS, execution time, performance,
    throughput, cost-performance, and
    power-performance.
  • 1.2. Explain why a given metric is or is not
    appropriate to use in a given situation.
  • 1.3. Identify (e.i) the specific figure(s) of
    merit that are most appropriate for choosing
    between alternative computer designs in a given
    scenario.
  • 1.4. Formulate (e.ii) appropriate symbolic
    equations for calculating a desired figure of
    merit from the provided information about an
    architectural scenario.
  • 1.5. Solve (e.iii) problems involving the
    determination of which of several computer
    designs would be preferable in a given scenario.
  • 1.6. Apply Amdahls Law (and generalizations
    thereof) in characterizing the relationship
    between an improvement to a particular component
    of a system and the overall improvement of the
    whole system.
  • 1.7. Apply (a) the CPU Performance Equation that
    relates performance and execution time to
    instruction count, CPI, and clock frequency.

7
Topic 1
  • Overview of Some Important Metrics for Computer
    SystemsPerformance, Cost, and Power Consumption

8
Important Performance Metrics
  • Some metrics that are often used, but that do not
    always accurately reflect true performance
  • CPU clock frequency number of CPU clock cycles
    per unit time
  • MIPS rating How many Millions of Instructions
    Per Second
  • Benchmark ratings (e.g., SPECmarks) more on
    this later
  • Metrics that are true measures of performance
  • Total execution time of a work unit (on real
    applications)
  • Wall-clock time from beginning to end of the
    execution process
  • Performance 1/(execution time)
  • For a single work unit
  • Throughput ( work units)/(execution time)
  • A generalized kind of performance

9
Cost and Cost-Related Metrics
  • In the real world, the performance of a system is
    not the only thing that is important
  • For example, its cost may also matter a lot!
  • E.g., the IBM Blue Gene/L has really high
    performance, but youre not likely to buy it as
    your next computer
  • We almost always have budgetary constraints.
  • The usual goal Maximize the cost-performance
    (i.e., cost-efficiency) of the systems that you
    buy.
  • Cost-performance (performance) / (cost).
  • In other words, you want to get the best value
    for your dollar.
  • This strategy roughly maximizes total throughput
    within a fixed budget.
  • Whenever you can have many systems gathered
    together working in parallel.

10
Throughput and Cost-Performance
  • When there is a fixed budget, the maximum
    throughput of a parallel system is (roughly) ?
    the cost-performance of the individual serial
    units.

11
The Vanishing Computer Cost
12
Cost/Performance
Figure 4.1 Performance improvement as a
function of cost.
 
13
Importance of Power Consumption
  • In the real world, a computers performance and
    manufacturing cost are not the only important
    concerns
  • Operating costs, usability, and other factors may
    also be important!
  • Today, power consumption is an increasingly
    important factor that impacts all of the
    following
  • Manufacturing cost, operating cost, performance,
    and usability!
  • In general, higher power consumption means
  • More manufacturing cost
  • for more aggressive power-delivery cooling
    systems
  • power supplies, heat sinks, fans, etc.
  • Higher operating cost
  • More electricity consumed, frequent
    changing/recharging of batteries, inconvenience
    to user
  • Lower performance
  • Higher performance would exceed limits of cooling
    system
  • Poor usability / poorer overall quality of
    product
  • Annoyingly noisy cooling fans or data center A/C
    units, laptops that burn up your lap
  • So in many design scenarios, we may wish to
    maximize performance within a fixed power budget,
    or minimize power consumption to reach a desired
    performance.

14
Throughput and Power-Performance
  • When there is a fixed power budget, the maximum
    throughput of a parallel system is (roughly) ?
    the power-performance of the individual serial
    units.
  • This is exactly analogous to the earlier
    cost-performance analysis.

15
Performance Maximizationwithin Cost and Power
Constraints
  • Suppose we have both a cost budget and a power
    budget,
  • and we want to maximize system throughput.
  • With a given unit design, we must maximize the
    number of units.
  • Then we have the following constraints on nunits
  • nunits ? Cunit Cmax
  • So, nunits ? Cmax/Cunit ?
  • nunits ? Punit Pmax
  • and nunits ? Pmax/Punit ?
  • The largest value of nunits within these
    constraints is
  • nunits min( ? Cmax/Cunit ?, ? Pmax/Punit ? )
    ? min( Cmax/Cunit, Pmax/Punit) ?
  • and so the maximum feasible throughput is
  • Ttot Tunit ? nunits Tunit ?
    ?min(Cmax/Cunit,Pmax/Punit)?

C cost P power T throughput
16
Power-Performance and Energy Efficiency
  • Power-performance means performance (i.e.,
    throughput) per unit of power consumption
  • power-performance (throughput)/(power).
  • Of course, since
  • throughput (work units)/(time) and
  • power (energy consumed)/(time),
  • The times cancel, and so power-performance is
    equal to
  • (work units)/(energy consumed)
  • In other words, system power-performance is the
    same thing as the energy efficiency of the
    underlying computing process.
  • To maximize power-performance, minimize the
    amount of energy that is consumed per unit of
    work that is performed.

17
System Optimization Example
  • Suppose you have a budget of 1M to set up a new
    corporate data center that should have a total
    power consumption of no more than 100kW while
    serving web transactions in a simple database
    application. If your goal is to maximize total
    performance (in transactions/second) while
    staying within your budget and meeting the power
    constraint, which of the following types of
    machines would be preferable as a basis for the
    design?
  • Sun servers, each 15,000, burning 100W,
    processing 100 transactions/second
  • Playstation 2s, each 100 from flea market, 30W,
    processing 30 transactions/second
  • Solution
  • A PS2-based design could attain 50? higher
    throughput and use only 1/3 of the budget while
    still meeting the power constraints!

18
Topic 2
  • Measuring Computer Performance

19
4.2 Defining Computer Performance
Figure 4.2 Pipeline analogy shows that
imbalance between processing power and I/O
capabilities leads to a performance bottleneck.
20
Concepts of Performance and Speedup
  • Performance 1 / Execution time
    is simplified to
  • Performance 1 / CPU execution time
  • (Performance of M1) / (Performance of M2)
    Speedup of M1 over M2
  • (Execution time on M2) / (Execution time
    on M1)
  • Terminology M1 is x times as fast as M2 (e.g.,
    1.5 times as fast)
  • M1 is 100(x 1) faster than M2 (e.g., 50
    faster)
  • CPU time (Clock cycles executed) ? (Time per
    cycle)
  • Instructions ? (Cycles per instruction)
    ? (Time per cycle)
  • Instructions ? CPI / (Clock frequency)
  • Instruction count, CPI, and clock rate are not
    completely independent, so improving one by a
    given factor may not lead to overall execution
    time improvement by the same factor.

CPU performance equation
 
21
Faster Clock ? Shorter Running Time
Figure 4.3 Faster steps do not necessarily
mean shorter travel time.
 
22
4.3 Performance Enhancement Amdahls Law
f fraction unaffected p speedup
of the rest
Figure 4.4 Amdahls law speedup achieved if
a fraction f of a task is unaffected and the
remaining (1f) part runs p times as fast.
23
Amdahls Law Used in Design
Example 4.1
  • A processor spends 30 of its time on flp
    addition, 25 on flp mult,
  • and 10 on flp division. Evaluate the following
    enhancements, each
  • costing the same to implement
  • Redesign of the flp adder to make it twice as
    fast.
  • Redesign of the flp multiplier to make it three
    times as fast.
  • Redesign the flp divider to make it 10 times as
    fast.
  • Solution
  • Adder redesign speedup 1 / 0.7 0.3 / 2
    1.18
  • Multiplier redesign speedup 1 / 0.75 0.25 /
    3 1.20
  • Divider redesign speedup 1 / 0.9 0.1 / 10
    1.10
  • What if both the adder and the multiplier are
    redesigned?

 
24
4.4 Performance Measurement vs. Modeling
Figure 4.5 Running times of six programs on
three machines.
25
Performance Benchmarks
Example 4.3
  • You are an engineer at Outtel, a start-up
    aspiring to compete with Intel
  • via its new processor design that outperforms the
    latest Intel processor
  • by a factor of 2.5 on floating-point
    instructions. This level of performance
  • was achieved by design compromises that led to a
    20 increase in the
  • execution time of all other instructions. You are
    in charge of choosing
  • benchmarks that would showcase Outtels
    performance edge.
  • What is the minimum required fraction f of time
    spent on floating-point instructions in a program
    on the Intel processor to show a speedup of 2 or
    better for Outtel?
  • Solution
  • We use a generalized form of Amdahls formula in
    which a fraction f is speeded up by a given
    factor (2.5) and the rest is slowed down by
    another factor (1.2) 1 / 1.2(1 f) f /
    2.5 ? 2 ? f ? 0.875

 
26
Performance Estimation
Average CPI ?All instruction classes (Class-i
fraction) ? (Class-i CPI) Machine cycle time
1 / Clock rate CPU execution time
Instructions ? (Average CPI) / (Clock rate)
Table 4.3 Usage frequency, in percentage, for
various instruction classes in four
representative applications.
Application ? Instrn class ? Data compression C language compiler Reactor simulation Atomic motion modeling
A Load/Store 25 37 32 37
B Integer 32 28 17 5
C Shift/Logic 16 13 2 1
D Float 0 0 34 42
E Branch 19 13 9 10
F All others 8 9 6 4
 
27
MIPS Rating Can Be Misleading
Example 4.5
  • Two compilers produce machine code for a program
    on a machine
  • with two classes of instructions. Here are the
    number of instructions
  • Class CPI Compiler 1 Compiler 2
  • A 1 600M 400M
  • B 2 400M 400M
  • What are run times of the two programs with a 1
    GHz clock?
  • Which compiler produces faster code and by what
    factor?
  • Which compilers output runs at a higher MIPS
    rate?
  • Solution
  • Running time 1 (2) (600M ? 1 400M ? 2) / 109
    1.4 s (1.2 s)
  • b. Compiler 2s output runs 1.4 / 1.2 1.17
    times as fast
  • c. MIPS rating 1, CPI 1.4 (2, CPI 1.5) 1000
    / 1.4 714 (667)

 
28
4.5 Reporting Computer Performance
Table 4.4 Measured or estimated execution
times for three programs.
Time on machine X Time on machine Y Speedup of Y over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
All 3 progs 2520 450 5.6
Analogy If a car is driven to a city 100 km away
at 100 km/hr and returns at 50 km/hr, the average
speed is not (100 50) / 2 but is obtained from
the fact that it travels 200 km in 3 hours.
29
Comparing the Overall Performance
Table 4.4 Measured or estimated execution
times for three programs.
Time on machine X Time on machine Y Speedup of Y over X
Program A 20 200 0.1
Program B 1000 100 10.0
Program C 1500 150 10.0
Speedup of X over Y
10 0.1 0.1
Arithmetic mean
6.7
3.4
Geometric mean
2.15
0.46
Geometric mean does not yield a measure of
overall speedup, but provides an indicator that
at least moves in the right direction
30
4.6 The Quest for Higher Performance
State of available computing power ca. the early
2000s Gigaflops on the desktop Teraflops in
the supercomputer center Petaflops on the
drawing board Note on terminology (see Table
3.1) Prefixes for large units Kilo 103,
Mega 106, Giga 109, Tera 1012, Peta
1015 For memory K 210 1024, M 220,
G 230, T 240, P 250 Prefixes for small
units micro 10-6, nano 10-9, pico
10-12, femto 10-15
31
Supercom-puters
Figure 4.7 Exponential growth of
supercomputer performance.
 
32
The Most Powerful Computers
Figure 4.8 Milestones in the DOEs
Accelerated Strategic Computing Initiative (ASCI)
program with extrapolation up to the PFLOPS
level.
 
33
Performance is Important, But It Isnt Everything
Figure 25.1 Trend in energy consumption per
MIPS of computational power in general-purpose
processors and DSPs.
 
34
Computer Architecture Lecture Notes Spring
2005Dr. Michael P. Frank
  • Competency Area 2
  • Performance Metrics
  • Lecture 1

35
Performance Metrics
  • Why is it necessary for us to study performance?
  • Performance is usually the key to the
    effectiveness of a system (hardware software).
  • Performance is critical to customers
    (purchasers), thus, we as designers and
    architects must also make it a priority.
  • Performance must be assessed and understood in
    order for a system to communicate efficiently
    with peripheral devices.

36
Topic Computer Performance
  • Sub-Topic Airplane Analogy

37
Performance Metrics
  • How can we determine performance?

Consider this example from the transportation
industry
38
Performance Example
  • Fuel Capacity in liters
  • Range in kilometers
  • Speed in kilometers/hour
  • Throughput is defined as
  • ( of passengers) x (cruising speed)
  • Cost is given as
  • (fuel capacity) / (passengers x range)
  • Which mode of transportation has the best
    performance?

39
Performance Example
  • It depends on how we define performance.
  • Consider raw speed
  • Getting from one place to another quickly

40
Performance Example
  • What if were interested in the rate at which
    people are carried throughput

41
Performance Example
  • Often times we relate performance and cost. Thus
    we can consider the amount of fuel used per
    passenger

42
Topic Computer Performance
  • Sub-Topic Basic Concepts Performance,
    Throughput, and Execution Time

43
Performance Metrics
  • Similar measures of performance are used for
    computers.
  • Number of computations done per unit of time
  • Cost of computations
  • Possibly several aspects of cost can be
    considered including initial purchase price,
    operating cost, cost of training users of system,
    etc.
  • Common performance measures are
  • RESPONSE TIME the amount of time it takes a
    program to complete (a.k.a execution time)
  • THROUGHPUT the total amount of work done in a
    given amount of time

44
Performance Metrics
  • Example
  • Given the following actions
  • 1. Replacing processor with a faster version
  • 2. Adding additional processors to perform
    separate tasks in a multiprocessor system
  • do they (a) increase throughput, (a) decrease
    response time or (c) both?

45
Defining Performance
  • Our focus will be primarily on execution time.
  • To maximize performance implies a minimization in
    execution time
  • For two machines
  • We say that machine Y is faster than machine X.

46
Performance Metrics
  • Notes

(1) If X is n times faster than Y, then
  • To avoid confusion, well use the following
    terminology
  • We say We mean
  • improve performance ? increase
    performance
  • improve execution time ? decrease execution
    time

47
Performance Example
If machine A runs a program in 10 seconds and
machine B runs the same program in 15 seconds,
how much faster is A than B?
48
Performance Example
If machine A runs a program in 10 seconds and
machine B runs the same program in 15 seconds,
how much faster is A than B?
49
Topic Computer Performance
  • Sub-Topic Measuring Performance

50
Measuring Performance
  • Quite simply, TIME is the measure of computer
    performance!
  • The most straightforward definition of time is
    wall-clock time ? elapsed time ? response time.

Total time to complete a task including system
overhead activities such as Input/Output tasks,
disk and memory accesses, etc.
51
Measuring Performance
  • CPU Time is the time it takes to complete a task
    excluding the time it takes for I/O waits.

CPU TIME
USER CPU TIME The time CPU is busy executing the
users code.
SYSTEM CPU TIME The time CPU spends performing
operating system tasks.
Note Sometimes system and user CPU times are
difficult to distinguish since it is hard to
assign responsibility for OS activities.
52
Measuring Performance
  • Example,
  • To understand the concept of CPUTime, consider
    the UNIX command time. Once typed, it may
    return a response similar to
  • 90.7u 12.9s 239 65
  • What do these numbers mean?

53
Measuring Performance
  • Example,
  • To understand the concept of CPUTime, consider
    the UNIX command time. Once typed, it may
    return a response similar to
  • 90.7u 12.9s 239 65

of elapsed time that is CPU time
User CPU Time
System CPU Time
Elapsed Time
54
Measuring Performance
  • Example,
  • To understand the concept of CPUTime, consider
    the UNIX command time. Once typed, it may
    return a response similar to
  • 90.7u 12.9s 239 65
  • What is the total CPUTime?
  • Percentage of time spent on I/O and other
    programs?

55
Measuring Performance
  • Example,
  • To understand the concept of CPUTime, consider
    the UNIX command time. Once typed, it may
    return a response similar to
  • 90.7u 12.9s 239 65
  • What is the total CPUTime?
  • Percentage of time spent on I/O and other
    programs?

56
Measuring Performance
  • Other notes
  • SYSTEM PERFORMANCE reciprocal of elapsed time
    on an unloaded system (e.g. no user applications)
  • CPU PERFORMANCE recip. of user CPU time
  • CLOCK CYCLES (CC) discrete time intervals
    measured by the processor clock running at a
    constant rate.
  • CLOCK PERIOD time it takes to complete a clock
    cycle
  • CLOCK RATE inverse of clock period

57
Measuring Performance
  • Consider CPU performance
  • Also,

58
Measuring Performance
  • Since the execution time clearly depends on the
    number of instructions for a program, we must
    also define another performance metric
  • CPI average number of clock cycles
  • per instruction

59
Measuring Performance
  • Now we have two more equations that we can define
    for CPUTime

60
Measuring Performance
  • In summary, performance metrics include

Components of Performance Units of Measure
CPUTime Seconds for program
IC of instructions for a program
CPI Average of clock cycles per instructions
tCC Seconds per clock cycle
61
Measuring Performance
  • Example,
  • Suppose Machine A implements the same ISA as
    Machine B. Given and
  • for some program, and
  • and for the same program, determine
    which machine is faster and by how much.

62
Breakdown by Instruction Category
  • Recall CPI Clock cycles (CC) per instruction
  • But, CPI depends on many factors, including
  • Memory system behavior
  • Processor structure
  • Availability special processor features
  • E.g., floating point, graphics, etc.
  • To characterize the effect of changing specific
    aspects of the architecture, we find it helpful
    to break down CC into components due to different
    classes (categories) of instructions
  • Where
  • ICi instruction count for class i
  • CPIi avg. cycles for insts. in class i
  • n the number of instruction classes

63
Example
  • Suppose a processor has 3 categories of
    instructions A,B,C with the following CPIs
  • And, suppose a compiler designer is comparing two
    code sequences for a given program that have the
    following instruction counts
  • Determine
  • (i) Which code sequence executes the most
    instructions?
  • (ii) Which will be faster?
  • (iii) What is the average CPI for each code
    sequence?

Instr. Class CPIi
A 1
B 2
C 3
Code Seq. Inst. counts Inst. counts Inst. counts
Code Seq. ICA ICB ICC
1 2 1 2
2 4 1 1
64
Solution to Example
  • Part (i)
  • ICseq1 2 1 2 5 instructions
  • ICseq2 4 1 1 6 instructions
  • ? Code sequence 2 executes more instructions
  • Part (ii)
  • CCseq1 ?i(CPIixICi) 1x2 2x1 3x2 10
    cycles
  • CCseq2 ?i(CPIixICi) 1x4 2x1 3x1 9
    cycles
  • ? Code sequence 2 takes fewer cycles ? is faster!
  • Part (iii)
  • CPIseq1 CC/ICseq1 10 cyc./5 inst. 2
  • CPIseq2 CC/ICseq2 9 cyc./6 inst. 1.5
  • Which part should we consult to tell us which
    code sequence has better performance?

65
Topic Computer Performance
  • Subtopic
  • Benchmarks Performance Summaries

66
Importance of Benchmarks
  • How do we evaluate and compare the performance of
    different architectures?
  • We use benchmarks
  • Programs that are specifically chosen to measure
    performance.
  • A workload is a set of programs.
  • Benchmarks consist of workloads that (user hopes)
    will predict the performance of the actual
    workload
  • It is important that benchmarks consist of
    realistic workloads
  • Not simple toy programs or code fragments
  • Manufacturers often try to fine-tune their
    machines to do well on popular benchmarks that
    were too simple
  • This does not always mean the machine will do
    well on real programs!

67
SPEC benchmark
  • A popular source of benchmarks is SPEC
  • Standard Performance Evaluation Corporation
  • General CPU benchmarks CPU2000.
  • Includes programs such as
  • gzip (compression), vpr (FPGA place route), gcc
    (compiler), crafty (chess), vortex (database)
  • SPEC also offers specialized benchmarks for
  • Graphics, Parallel computing, Java, mail servers,
    network fileservers, web servers
  • They publish reports on benchmark results for
    various systems.
  • Main metric SPECRatio Proportional to average
    inverse execution time. The bigger, the better!
  • Reproducibility of results is very important!

68
Summarizing Performance
  • How do we summarize performance in a way that
    accurately compares different machines?
  • One common approach Total Execution Time (TET)
  • Based on
  • Or, if the workload includes n different
    programs, we can calculate the average or
    Arithmetic Mean (AM)
  • Smaller AM ? Improved performance
  • Other methods are also used
  • Weighted arithmetic mean, geometric mean ratio.

69
Topic Computer Performance
  • Subtopic Performance Improvementand Amdahls
    Law

70
Performance Improvement
  • Recall the formula CPUTime IC CPI / fcyc.
  • Thus, CPU performance is Perf f / (ICCPI).
  • Thus we can see 3 basic ways to improve CPU
    performance on a given task
  • Increase clock frequency
  • Decrease CPI
  • by improved processor organization
  • Decrease instruction count
  • By compiler enhancement,
  • change in ISA design (new instructions), or
  • A more efficient application algorithm.
  • However, we have to be careful!
  • Sometimes, improving one of these can hurt others!

71
Generalized Cost Measures
  • In this course, we will often be focusing on ways
    to minimize execution time of programs.
  • Either CPU time, or number of clock cycles.
  • Execution time is one example of what we may call
    a generalized cost measure (GCM).
  • A GCM is any property of a HW/SW design that
    tells us how much of some valued resource is used
    up when the system is manufactured or used.
  • Other examples of important GCMs include
  • Energy consumed by a computation
  • Silicon chip area used up by a circuit design
  • Dollar cost to manufacture a computer component
  • We will study some general engineering principles
    that apply to the minimization of any GCM in any
    system.

72
Additive Cost Measures
  • Let us suppose we have a GCM C for a system.
  • Many times, the total cost C can be represented
    as a sum of independent cost components
  • E.g., C C1 C2 Cn or .
  • These could correspond to the resources used by
    individual subsystems of the whole system.
  • Or, used in doing particular categories of tasks.
  • For example, execution time T can be broken down
    as the sum of time Tfp taken by floating-point
    instructions and the time Toth for others.
  • That is, T Tfp Toth.

73
Improving Part of a System
  • Suppose a GCM is broken down as C A B.
  • The total cost is the sum of two components A
    B.
  • Now suppose you are considering making an
    improvement to the system design that affects
    only cost component B.
  • Suppose you reduce it by a factor f, to B' B/f.
  • The new total cost is then C' A B'.
  • The cost of component A is unaffected.
  • Overall (total) cost has therefore been reduced
    by the factor

74
Diminishing Returns
  • Suppose we continue improving (reducing) a cost
    component by larger and larger factors.
  • Does this mean the systems total cost will be
    reduced by correspondingly large factors? ? NO!
  • Even if we improved one cost component (B in our
    example) by a factor of f 8, note that
  • Even here, the overall cost reduction factor
    foverall would still be only the finite value
    1B/A!
  • The system can only be improved by at most this
    factor, if we improve just the one component B.

75
Diminishing Returns Example
  • Suppose a particular chip contains B 1 cm2 of
    logic circuits, and A 2 cm2 of cache memory.
  • The total cost (in terms of area) is C AB 3
    cm2.
  • Now, lets go crazy trying to simplify and shrink
    the design of just the logic circuit
  • What is the maximum factor by whichthis tactic
    can reduce the area cost of the whole design
    (logicmemory)?
  • Obviously, this can reduce the total area from 3
    (cm2) to no less than 2 (area of memory alone),
  • or, shrink it by a factor of foverall 3/2
    1.5.
  • Note we could have obtained this same answer
    using the equation foverall,max 1B/A as well.

Logic1 cm2
Memory2 cm2
76
Graph Showing Diminishing Returns
Part/rest (initial)
(B/A)
( f )
77
Important Lessons to Take from This
  • Its probably not worth spending significant
    design time extensively improving just a single
    component of a system,
  • Unless that component accounts for a dominant
    part of the total cost (by some measure) to begin
    with.(B/A gtgt 1).
  • Its only worth improving a given component up to
    the point where it is no longer dominant.
  • Reducing it further wont make a lot of
    difference.
  • Therefore, all components with significant costs
    must be improved together in order to
    significantly improve an entire design.
  • Well-engineered systems will tend to have roughly
    comparable costs in all of their major components.

78
Other Ways to Calculate foverall
  • Earlier, we saw this formula
  • For the overall improvement factorfoverall
    resulting from improvingcomponent B by the
    factor f.
  • But, what if we dont know the values of A and B?
  • What if we only know their relative sizes?
  • Fortunately, it turns out that we can still
    calculate foverall.
  • Let us define fracenh B/C B/(AB) to be the
    fraction of the original total system cost that
    is accounted for by the particular part B that is
    going to be enhanced.
  • Then, the fraction of cost accounted for by A
    (the rest of the system) is
  • Our equation for foverall can then be reexpressed
    in terms of the quantities fracenh and 1-fracenh,
    as follows

79
Calculating foverall in terms of fracenh
  • Lets re-express foverall in terms of fracenh
  • We will call this form for foverall the
    Generalized Amdahls Law. (Well see why in a
    moment.)

80
Amdahls Law Proper
  • We saw that execution time is one valid cost
    measure.
  • In such a case, note that the factor by which a
    cost is reduced is the speedup, or the factor by
    which performance is improved.
  • We thus rename the improvement factor f of B
    (the enhanced part) to speedupenh, and the
    overall improvement factor foverall becomes
    speedupoverall, and we get
  • This is called Amdahls Law, and it is one of the
    most widely hyped quantitative principles of
    processor design.
  • But as we can see, it is not a special law of CPU
    architecture, but just an application of the
    universal engineering principle of diminishing
    returns which we discussed earlier.

81
Key Points from This Module
  • Throughput vs. Response Time
  • Performance as Inverse Execution Time
  • Speedup Factors
  • Averaging Benchmark Results
  • CPU Performance Equation
  • Execution time IC CPI tcc
  • Performance fcc / (IC CPI)
  • Amdahls Law
  • C' A B/f
  • Implies

C Execution time after improvement B Part of
execution time affected by improvement f Factor
of improvement (speedup of enhanced part) A
Part of execution time unaffected by improvement
82
Example Performance Calculation
  • Suppose program takes 10 secs. on computer A
  • And suppose computer A has a 4 GHz clock
  • Want new computer B to run prg. in 6 seconds.
  • Suppose that increasing the clock speed is only
    possible with a substantial processor redesign,
  • which will result in 1.2 as many clock cycles
    being needed to execute the program.
  • What clock rate is needed?
  • Answer 4 GHz (10/6) 1.2 8 GHz

83
Another Example
  • Consider two different implementations of a given
    ISA, running a given benchmark
  • Processor A has a cycle time of 250 ps
  • And a CPI of 2.0
  • Processor B has a cycle time of 500 ps
  • And a CPI of 1.2
  • Which computer is faster on this benchmark, and
    by what factor?
  • Processor A takes 250 ps 2.0 500 ps / instr.
  • Processor B takes 500 ps 1.2 600 ps / instr.
  • Thus, A is faster by a factor of 6/5 1.2.

84
Another example
  • Suppose some Java application takes 15 seconds on
    a certain machine.
  • A new Java compiler is released that requires
    only 0.6 as many dynamic instructions to run the
    application.
  • Unfortunately, it also increases the CPI by 1.1
  • Presumably, uses more multi-cycle instructions.
  • How fast will the application run when compiled
    using the new compiler?
  • It will take 15 0.6 1.1 9.9 seconds to run
  • It will be 15/9.9 50/33 1.515 faster
  • Only slightly more than 50 faster than before.

85
Another Example
  • Consider the following measurements of execution
    time
  • Which of the following statements are true?
  • A is faster than B for program 1.
  • A is faster than B for program 2.
  • A is faster than B for a workload with equal
    numbers of executions of programs 1 and 2.
  • A is faster than B for a workload with twice as
    many executions of program 1 as of program 2.

Program Computer A Computer B
1 2 sec. 4 sec.
2 5 sec. 2 sec.
Write a Comment
User Comments (0)
About PowerShow.com