Title: Fundamentals of Computer Design
1Fundamentals of Computer Design
2Outline
- Performance Evolution
- The Task of a Computer Designer
- Technology and Computer Usage Trends
- Cost and Trends in Cost
- Measuring and Reporting Performance
- Quantitative Principles of Computer Design
3Computer Architecture Is
- The attributes of a computing system as seen by
the programmer, i.e., the conceptual structure
and functional behavior, as distinct from the
organization of the data flows and controls, the
logic design, and the physical implementation.
(Amdahl, Blaaw, and Brooks, 1964)
4Computer Architectures Changing Definition
- 1950s to 1960s Computer Architecture Course
- Computer Arithmetic
- 1970s to mid 1980s Computer Architecture Course
- Instruction Set Design, especially ISA
appropriate for compilers - 1990s to 2000s Computer Architecture Course
- Design of CPU, memory system, I/O system,
Multiprocessors
5Performance Evolution
- 1K today buys a gizmo better than 1M could buy
in 1965. - 1970s
- Mainframes dominated performance improved
2530/yr - Mostly due to improved architecture some
technology aids - 1980s
- VLSI microprocessor became the foundation
- Technology improves at 35/yr
- Machine language death opportunity
- Mostly with UNIX and C in mid-80s
- Even most system programmers gave up assembly
language - With this came the need for efficient compilers
6Performance Evolution (Cont.)
- 1980s (Cont.)
- Compiler focus brought on the great CISC vs. RISC
debate - With the exception of Intel RISC won the
argument - RISC performance improved by 50/year initially
- Of course RISC is not as simple anymore and the
compiler is a key part of the game - Does not matter how fast your computer is, if the
compiler wastes most of it due to the inability
to generate efficient code - With the exploitation of instruction-level
parallelism (pipeline super-scalar) and the use
of caches, performance is further enhanced
CISC Complex Instruction Set Computing RISC
Relegate Important Stuff to the Compiler (Reduced
Instruction Set Computing)
7Growth in Performance (Figure 1.1)
Mainly due to advanced architecture ideas
Technology driven
8The Task of A Computer Designer
9Aspects of Computer Design
- Changing face of computing different system
design issues - Desktop computing
- Servers
- Embedded computers
- Bottom line is that it is a complex game
- Determine important attributes (perhaps a market
issue) - Functional Requirement
- THEN maximize performance
- WHILE staying within the cost and power
constraints - Classic conflicting constraint problem
10A Summary of the Three Computing Classes and
Their System Characteristics
11Functional Requirements
12Functional Requirements (Cont.)
Functional Requirement Typical
Features Required or Supported
13Aspects of Computer Design
Software
Hardware
OurFocus
Architecture
Implementation
VLSI
Logic
Power
Packaging
14Task of A Computer Design
15Task of A Computer Design
Shared Memory, Message Passing, Data Parallelism
M
P
M
P
M
P
M
P
Network Interfaces
S
Interconnection Network
Processor-Memory-Switch
Topologies, Routing, Bandwidth, Latency, Reliabili
ty
Multiprocessors Networks and Interconnections
16Optimizing the Design
- Usually the functional requirements are set by
the company/marketplace - Which design is optimal dependent on the choice
of metric - Cost minimized ? simple design
- Performance maximized ? complex design or better
technology - Time to market minimized ? also favors simplicity
- Oh and you only get one shot
- Requires heaps of simulation and must quantify
everything - Inherent requirements for deep infrastructure and
support - Plus you must predict the trends
17Key trends that must always be tracked
- Usage patterns and the market
- Technology
- Cost and performance
18Technology and Computer Usage Trends
19Usage Trends
- Memory usage
- Average program needs grow by 50 to 100/year
- Impact - add an address bit each year
(Instruction set) - Assembly language replaced by HLL
- Increasingly important role of compilers
- Compiler and architecture types MUST now work
together - Whacked on pictures - even TV
- Graphics and multimedia capability
- Whacked on communications
- I/O subsystems become a higher priority
20Technology Trends
- Integrated Circuits
- Density increases at 35/yr.
- Die size increases 10-20/yr
- Combination is a chip complexity growth rate of
55/yr - Transistor speed increase is similar but signal
propagation does not track this curve - so clock
rates dont go up as fast - Semiconductor DRAM
- Density quadruples every 3 years (approx. 60/yr)
4x steps - Cycle time decreases slowly - 33 in 10 years
- Interface changes have improved bandwidth
21Technology Trends (Cont.)
- Magnetic Disk
- Currently density improves at 100/yr
- Access time has improved by 33 in 10 years
- Network Technology
- Depends both on the performance of switches and
transmission system - 1GB Ethernet becomes available about 5 years
after 100MB - Doubling in bandwidth every year
- Scaling of transistor performance, wires, and
power in ICs
22Effects of Rapid Technology Trends
- Consider todays product cycle
- concept to production 2 years
- AND market requirement
- of something new every 6 - 12 months
- Implications
- Pipelined design efforts using multiple design
teams - Have to design for a complexity target that cant
be implemented until the end of the cycle (Design
for the next technology) - Cant afford to miss the best technology so you
have to chase the trends
23Cost, Price, and Their Trends
24Cost
- Clearly a market place issue -- profit as a
function of volume - Lets focus on hardware costs
- Factors impacting cost
- Learning curve manufacturing costs decrease
over time - Yield the percentage of manufactured devices
that survives the testing procedure - Volume is also a key factor in determine cost
- Commodities are products that are sold by
multiple vendors in large volumes and are
essentially identical.
25Learning Curve at Work
26Integrated Circuits Costs
27Remember This Comic?
28Cost of an Integrated Circuit
- The cost of a packaged integrated circuit is
- Cost of dieCost of testing
dieCost of packaging and final test - Cost of IC---------------------------------------
-------------------------------------- -
Final test yield - Cost of die(Cost of wafer) / (Dies per wafer ?
Die yield) - ? ? (Wafer
diameter/2)2 ? ? Wafer diameter - Dies per wafer------------------------------ -
------------------------- - Die area
(2 ? Die area) 0.5
29Cost of an Integrated Circuit
- The fraction or percentage of good dies on a
wafer number (die yield) - Defects per
unit area ? Die area -? - Die yieldWafer yield ? 1 ---------------------
--------------------- -
? - Where ? is a parameter that corresponds roughly
to the number of masking level, a measure on
manufacturing complexity, critical to die yield
(? 4.0 is a good estimate).
Die Cost goes roughly with die area5
30Example Finding the number of dies
- Find the number of die per 30-cm wafer for a die
that is 0.7 cm on a side. - Ans The total die area is 049 cm2. Thus
-
? ? (30/2)2
? ? 30 Dies per wafer ------------- ?
---------------- 1347
0.49 ( 2 ? 0.49)0.5
31Example Finding the die yield
- Find the die yield for dies that are 1 cm on a
side and 0.7 cm on a side, assuming a defect
density of 0.6 per cm2. - Ans The total die areas are 1 cm2 and 0.49
cm2. - For the larger die yield is
- Die yield1(0.6 ? 1)/4-40.57
- For the smaller die, it is
- Die yield 1(0.6 ? 0.49)/4-40.75
32Computer Designers and Chip Costs
- The computer designer affects die size, and hence
cost, both by what functions are included on or
excluded from the die and by the number of I/O
pins
33Cost/Price
- Component Costs
- Direct Costs (add 10 to 30) costs directly
related to making a project - Labor, purchasing, scrap, warranty
- Gross Margin (add 10 to 45) the companys
overhead that cannot be billed directly to one
project - RD, marketing, sales, equipment maintenance,
rental, financing cost, pretax profits, taxes - Average Discount to get List Price (add 33 to
66) - Volume discounts and/or retailer markup
34Cost/Price Illustration
35Cost/Price for Different Kinds of Systems
36Measuring and Reporting Performance
37Performance
- 2 key aspects making 1 faster may slow the
other - Execution time (single task)
- Throughput (multiple tasks)
- Comparing performance
- Key measurement is Time of real programs
- MIPS? MFLOPS?
- Performance 1/execution time
- If X is N times faster than Y
- Similar for throughput comparisons
- Improved performance ? decreasing execution time
38Measuring Performance
- Several kinds of time
- Wall-clock time response time, or elapsed time
- Load, I/O delays, OS overhead
- CPU time - time spent computing your program
- Factors out time spent waiting for I/O delays
- But includes the OS your program
- Hence system CPU time, and user CPU time
39OS Time
- Unix time command reports
- User CPU time
- System CPU time
- Total elapsed time
- of elapsed time that is user system CPU time
- Tells you how much time you spent waiting as a
- BEWARE
- OSs have a way of under-measuring themselves
90.7u 12.9s 239 65
40Choosing Programs to Evaluate Performance
- Real applications clearly the right choice
- Porting and eliminating system-dependent
activities - User burden -- to know which of your programs you
really care about - Modified (or scripted) applications
- Enhance portability or focus on particular
aspects of system performance - Kernels small, key pieces of real programs
- Best used to isolate performance of individual
features to explain the reasons from differences
in performance of real programs - Livermore Loops and Linpack are examples
- Not real programs however -- no user really uses
them
41Choosing Programs to Evaluate Performance (Cont.)
- Toy benchmarks quicksort, puzzle
- Beginning programming assignment
- Synthetic benchmarks
- Try to match the average frequency of operations
and operands of a large set of programs - No user really runs them -- not even pieces of
real programs - They typically reside in cache dont test
memory performance - At the very least you must understand what the
benchmark code is in order to understand what it
might be measuring - Companies thrive or bust on benchmark performance
- Hence they optimize for the benchmark
- BEWARE ALWAYS!!
42Benchmark Suites
- SPEC (Standard Performance Evaluation
Corporation) - http//www.spec.org
- Desktop benchmarks
- CPU-intensive SPEC CPU2000
- Graphic-intensive SPECviewperf
- Server benchmarks
- CPU throughput-oriented SPECrate
- I/O activity SPECSFS (NFS), SPECWeb
- Transaction processing TPC (Transaction
Processing Council) - Embedded benchmarks
- EEMBC (EDN Embedded Microprocessor Benchmark
Consortium)
43Some PC Benchmarks
44SPEC CPU2000 Benchmark Suites - Integer
45SPEC CPU2000 Benchmark Suites Floating Point
46Reporting Performance Results
- Claim Spice takes X seconds on machine Y
- Missing
- Spice version input? What was the circuit?
- Operational parameters - time step, duration
- Compiler and version optimization settings
- Machine configuration - disk, memory, etc.
- Source code modification or hand-generated
assembly language - Reproducibility is a must
- List everything another experimenter would need
to duplicate the results
47Benchmark Reporting
48Other Problems
- Lets assume we can get the test jig specified
properly - See the following example
- Which is better?
- By how much?
- Are the program equally important?
49Some Aggregate Job Mix Options
- Arithmetic Mean - provides a simple average
- Does not account for weight - all programs
treated equal - Weighted arithmetic mean
- Weight is the frequency of use
- Better but beware the dominant program time
- Depend on the reference machine
50Weighted Arithmetic Mean
51Normalized Time Metrics
- Geometric Mean
- Has the nice property that
- Ratio of the means Mean of the ratios
- Consistent no matter which machine is the
reference - Better than arithmetic means but
- Dont form accurate prediction models dont
predict execution time - Still have to remain cautious (more drawbacks
pp. 3739)
52Normalized Time Metrics
Arithmetic mean should not be used to average
normalized execution time
53Quantitative Principles of Computer Design
54Make the Common Case Fast
- Most pervasive principle of design
- Need to validate that it is common or uncommon
- Often
- Common cases are simpler than uncommon cases
- e.g. exceptions like overflow, interrupts, ...
- Truly simple is usually both cheap and fast -
best of both worlds - Trick is to quantify the advantage of a proposed
enhancement
55Amdahls Law
Quantification of the diminishing return
principle
- Defines speedup gained from a particular feature
- Depends on 2 factors
- Fraction of original computation time that can
take advantage of the enhancement - e.g. the
commonality of the feature - Level of improvement gained by the feature
- Amdahls law
56Amdahl's Law (Cont.)
Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder
of the task is unaffected
57Simple Example
Amdahls Law says nothing about cost
- Important Application
- FPSQRT 20
- FP instructions account for 50
- Other 30
- Designers say same cost to speedup
- FPSQRT by 40x
- FP by 2x
- Other by 8x
- Which one should you invest?
- Straightforward plug in the numbers compare BUT
whats your guess??
58And the Winner Is?
59Calculating CPU Performance
- All commercial machines are synchronous
- Implies there is a clock which ticks once per
cycle - Hence there are 2 useful basic metrics
- Clock Rate - today in MHz
- Clock cycle time
- Clock cycle time 1/clock rate
- E.g. 250 MHz rate corresponds to a 4 ns. cycle
time
60Calculating CPU Performance (Cont.)
- We tend to count instructions executed IC
- Note looking at the object code is just a start
- What we care about is the dynamic count - e.g.
dont forget loops, recursion, branches, etc. - CPI (Clock Per Instruction) is a figure of merit
61Calculating CPU Performance (Cont.)
- 3 Focus Factors -- Cycle Time, CPI, IC
- Sadly - they are interdependent and making one
better often makes another worse (but small or
predictable impacts) - Cycle time depends on HW technology and
organization - CPI depends on organization (pipeline,
caching...) and ISA - IC depends on ISA and compiler technology
- Often CPIs are easier to deal with on a per
instruction basis
62Simple Example
- Suppose we have made the following measurements
- Frequency of FP operations (other than FPSQR)
25 - Average CPI of FP operations4.0
- Average CPI of other instructions1.33
- Frequency of FPSQR2
- CPI of FPSQR20
- Two design alternatives
- Reduce the CPI of FPSQR to 2
- Reduce the average CPI of all FP operations to 2
63And The Winner is