Title: Chapter 2 Computer Evolution and Performance
1Chapter 2Computer Evolution and Performance
2Contents
- A Brief History of Computers
- Designing for Performance
- Pentium and PowerPC Evolution
- Performance Evaluation
3ENIAC
A brief history of computers
- Electronic Numerical Integrator And Computer
- John Mauchly and John Presper Eckert
- Trajectory tables for weapons
- Started 1943 / Finished 1946
- Too late for war effort
- Used until 1955
- Decimal (not binary)
- 20 accumulators of 10 digits
- Programmed manually by switches
- 18,000 vacuum tubes
- 30 tons
- 1,500 square feet
- 140 kW power consumption
- 5,000 additions per second
4von Neumann/Turing
A brief history of computers
- Stored Program concept
- Main memory storing programs and data
- ALU operating on binary data
- Control unit interpreting instructions from
memory and executing - Input and output equipment operated by control
unit - Princeton Institute for Advanced Studies
- IAS
- Completed 1952
5von Nuemann Machine
A brief history of computers
Input Output Equipment
Arithmetic And Logic Unit
Main Memory
Program Control Unit
- If a program could be represented in a form
suitable for storing in memory, the programming
process could be facilitated - A computer could get its its instructions from
memory, and a program could could be set or
altered by setting the values of a portion of
memory
6IAS Memory Formats
A brief history of computers
Sign Bit
- 1000 x 40 bit words
- Binary number
- 2 x 20 bit instructions
- Each instruction consisting of an 8-bit opcode
- A 12-bit address designating one of the words in
memory
7IAS Registers
A brief history of computers
- Memory Buffer Register
- Containing a word to be stored in memory, or used
to receive a word from memory - Memory Address Register
- Specifying the address in memory of the word to
be written from or read into the MBR - Instruction Register
- Containing the 8-bit opcode instruction being
executed - Instruction Buffer Register
- Employed to hold temporarily the righthand
instruction from a word memory - Program Counter
- Containing the address of the next
instruction-pair to be fetched from memory - Accumulator and Multiplier Quotient
- Employed to hold temporarily operands and results
of ALU operations.
8Structure of IAS
A brief history of computers
9Partial Flowchart of IAS
A brief history of computers
10The IAS Instruction Set
A brief history of computers
11The IAS Instruction Set
A brief history of computers
12The IAS Instruction Set
A brief history of computers
- Data transfer
- Move data between memory and ALU registers or
between two ALU registers - Unconditional branch
- This sequence can be changed by a branch
instruction allowing decision points - Conditional branch
- The branch can be made dependent on a condition,
thus allowing decision points - Arithmetic
- Operations performed by the ALU
- Address modify
- Permits addresses to be computed in the ALU and
then inserted into instruction stored in memory.
13Commercial Computers
A brief history of computers
- 1947 - Eckert-Mauchly Computer Corporation
- UNIVAC I (Universal Automatic Computer)
- US Bureau of Census 1950 calculations
- Became part of Sperry-Rand Corporation
- Late 1950s - UNIVAC II
- Faster
- More memory
14IBM
A brief history of computers
- Had helped build the Mark I
- Punched-card processing equipment
- 1953 - the 701
- IBMs first stored program computer
- Scientific calculations
- 1955 - the 702
- Business applications
- Lead to 700/7000 series
15Computer Generations
A brief history of computers
Typical Speed (operations per second)
Technology
Approximate Dates
Generation
40,000 200,000 1,000,000 10,000,000 100,000,000
Vacuum tube Transistor Small- and Medium-scale Int
egration Large-scale Integration Very-large-scale
Integration
1946-1957 1958-1964 1965-1971 1972-1977 1978-
.
1 2 3 4 5
16Transistors
A brief history of computers
- Replaced vacuum tubes
- Smaller
- Cheaper
- Less heat dissipation
- Solid State device
- Made from Silicon (Sand)
- Invented 1947 at Bell Labs
- William Shockley et al.
17Transistor Based Computers
A brief history of computers
- Second generation machines
- NCR RCA produced small transistor machines
- IBM 7000
- Digital Equipment Corporation (DEC) - 1957
- Produced PDP-1
18IBM 700/7000 Series
A brief history of computers
19IBM 700/7000 Series
A brief history of computers
20An IBM 7094 Configuration
A brief history of computers
21The IBM 7094
A brief history of computers
- The most important point is the use of data
channels. A data channel is an independent I/O
module with its own processor and its own
instruction set. - Another new feature is the multiplexor, which is
the central termination point for data channel,
the CPU, and memory.
22Microelectronics
A brief history of computers
- Literally - small electronics
- A computer is made up of gates, memory cells and
interconnections - These can be manufactured on a semiconductor
- e.g. silicon wafer
23Microelectronics
A brief history of computers
- Data storage
- Provided by memory cells
- Data processing
- Provided by gates
- Data movement
- The paths between components are used to move
data from memory to memory and from memory
through gates to memory - Control
- The paths between components can carry control
signals. The memory cell will store the bit on
its input lead when the WRITE control signal is
ON and will place that bit on its output lead
when the READ control signal is ON.
24Wafer, Chip, and Gate
A brief history of computers
- Small-scale integration (SSI)
25Generations of Computer
A brief history of computers
- Vacuum tube - 1946-1957
- Transistor - 1958-1964
- Small scale integration - 1965 on
- Up to 100 devices on a chip
- Medium scale integration - to 1971
- 100-3,000 devices on a chip
- Large scale integration - 1971-1977
- 3,000 - 100,000 devices on a chip
- Very large scale integration - 1978 to date
- 100,000 - 100,000,000 devices on a chip
- Ultra large scale integration
- Over 100,000,000 devices on a chip
26Moores Law
A brief history of computers
- Increased density of components on chip
- Gordon Moore - cofounder of Intel
- Number of transistors on a chip will double every
year - Since 1970s development has slowed a little
- Number of transistors doubles every 18 months
- Cost of a chip has remained almost unchanged
- Higher packing density means shorter electrical
paths, giving higher performance - Smaller size gives increased flexibility
- Reduced power and cooling requirements
- Fewer interconnections increases reliability
27Growth in CPU Transistor Count
A brief history of computers
28IBM 360 series
A brief history of computers
- 1964
- Replaced ( not compatible with) 7000 series
- First planned family of computers
- Similar or identical instruction sets
- Similar or identical O/S
- Increasing speed
- Increasing number of I/O ports(i.e. more
terminals) - Increased memory size
- Increased cost
- Multiplexed switch structure
29Key Characteristics of 360 Family
A brief history of computers
- Many of its features have become standard on
other large computers
30DEC PDP-8
A brief history of computers
- 1964
- First minicomputer (after miniskirt!)
- Did not need air conditioned room
- Small enough to sit on a lab bench
- 16,000
- 100k for IBM 360
- Embedded applications OEM
- Later models of the PDP-8 used a bus structure
that is now virtually universal for minicomputers
and microcomputers
31PDP-8/E Block Diagram
A brief history of computers
- Highly flexible architecture allowing modules to
be plugged into the bus to create various
configurations
32Semiconductor Memory
A brief history of computers
- The first application of integrated circuit
technology to computers - construction of the processor
- also used to construct memories
- 1970
- Fairchild
- Size of a single core
- i.e. 1 bit of magnetic core storage
- Holds 256 bits
- Non-destructive read
- Much faster than core
- Capacity approximately doubles each year
33Evolution of Intel Microprocessors
A brief history of computers
34Evolution of Intel Microprocessors
A brief history of computers
35Evolution of Intel Microprocessors
A brief history of computers
36Microprocessor Speed
Design for performance
- In memory chips, the relentless pursuit of speed
has quadrupled the capacity of DRAM, every
years - Pipelining
- On board cache
- On board L1 L2 cache
- Branch prediction
- Data flow analysis
- Speculative execution
37Evolution of DRAM / Processor Characteristics
Design for performance
38Performance Mismatch
Design for performance
- Processor speed increased
- Memory capacity increased
- Memory speed lags behind processor speed
39Performance Balance
Design for performance
- It is responsible for carrying a constant flow of
program instructions and data between memory
chips and the processor ? The interface between
processor and main memory is the most crucial
pathway in the entire computer
40Trends in DRAM use
Design for performance
41Performance Balance
Design for performance
- On average, the number of DRAMs per system is
going down. - The solid black lines in the figure show that,
for a fixed-sized memory, the number of DRAMs
needed is declining - The shaded bands show that for a particular type
of system, main memory size has slowly increased
while the number of DRAMs has declined
42Solutions
Design for performance
- Increase number of bits retrieved at one time
- Make DRAM wider rather than deeper
- Change DRAM interface
- Cache
- Reduce frequency of memory access
- More complex cache and cache on chip
- Increase interconnection bandwidth
- High speed buses
- Hierarchy of buses
43Performance Balance
Design for performance
- Two constantly evolving factors to be coped with
- The rate at which performance is changing in the
various technology areas differs greatly from one
type of element to another - New applications and new peripheral devices
constantly change the nature of the demand on the
system in terms of typical instruction profile
and the data access patterns.
44Intel
Pentium and PowerPC evolution
- Pentium - results of design effort on CISCs
- 1971 - 4004
- First microprocessor
- All CPU components on a single chip
- 4 bit
- Followed in 1972 by 8008
- 8 bit
- Both designed for specific applications
- 1974 - 8080
- Intels first general purpose microprocessor
- 8086
- 16 bit, instruction cache, or queue
- 80286
- addressing a 16-Mbyte memory
45Intel
Pentium and PowerPC evolution
- 80386
- 32 bit, multitasking
- 80486
- built-in math coprocessor
- Pentium
- superscalar techniques
- Pentium Pro
- Pentium II
- Intel MMX thchnology
- Pentium III
- additional floating-point instruction
- Merced
- 64-bit organization
46PowerPC
Pentium and PowerPC evolution
- RISC systems
- PowerPC Processor Summary
47Two Notions of Performance
Performance evaluation
Plane
Speed
DC to Paris
Passengers
Throughput (pmph)
Boeing 747
610 mph
6.5 hours
470
286,700
BAD/Sud Concodre
1350 mph
3 hours
132
178,200
- Which has higher performance?
- Time to do the task (Execution Time)
- execution time, response time, latency
- Tasks per day, hour, week, sec, ns. ..
(Performance) - throughput, bandwidth
- Response time and throughput often are in
opposition
48To Assess Performance
Performance evaluation
- Response Time
- Time to complete a task
- Throughput
- Total amount of work done per time
- Execution Time (CPU Time)
- User CPU time
- Time spent in the program
- System CPU time
- Time spent in OS
- Elapsed Time
- Execution Time Time of I/O and time sharing
49Criteria of Performance
Performance evaluation
- Execution time seems to measure the power of the
CPU - Elapsed time measures the performance of whole
system including OS and I/O - User is interested in elapsed time
- Sales people are interested in the highest number
of performance that can be quoted - Performance analysist is interested in both
execution time and elapsed time
50Definitions
Performance evaluation
- Performance is in units of things-per-second
- bigger is better
- If we are primarily concerned with response time
- performance(x) 1
execution_time(x) - " X is n times faster than Y" means
- Performance(X)
- n ----------------------
- Performance(Y)
51Example
Performance evaluation
- Time of Concorde vs. Boeing 747?
- bigger is better
- Concord is 1350 mph / 610 mph 2.2 times faster
-
6.5hours/3hours - Throughput of Concorde vs. Boeing 747 ?
- Concord is 178,200 pmph / 286,700 pmph
- 0.62 times faster
- Boeing is 286,700 pmph / 178,200 pmph
- 1.6 times faster
- Boeing is 1.6 times (60 faster in terms of
throughput - Concord is 2.2 times (220 faster in terms of
flying time - We will focus primarily on execution time for a
single job
52Basis of Evaluation
Performance evaluation
Cons
Pros
- very specific
- non-portable
- difficult to run, or
- measure
- hard to identify cause
Actual Target Workload
- portable
- widely used
- improvements useful in reality
Full Application Benchmarks
Small kernel Benchmarks
- easy to run, early in design cycle
- peak may be a long way from application
performance
- identify peak capability and potential
bottlenecks
Microbenchmarks
53MIPS
Performance evaluation
- Millions of Instruction(Executed) Per Second
- Often used measure of performance
- Native MIPS
54MIPS
Performance evaluation
- Meaningless information
- Run a program and time it
- Count the number of executed instruction to get
MIPs rating - Problems
- Cannot compare different computers with different
instruction sets - Varies between programs executed on the same
computer - Peak MIPS
- This is what many manufacturers provide
- Usually neglecting peako
55Relative MIPS
Performance evaluation
- Call VAX 11/780 1 MIPS machine (not true)
- .
- .
- Makes MIPS rating more independent of benchmark
programs - Advantage of relative MIPS is small
56FLOPS
Performance evaluation
- Million Floating Point Instructions Per Second
- Used for engineering and scientific applications
where floating point operations account for a
high fraction of all executed instructions - Problems
- Program dependent
- Many programs does not use floating point
operations - Machine dependent
- Depends on relative mixture of integer and
floating point operations - Depends on relative mixture of cheep(.-) and
expensive() floating point operations - Normalized FLOPS (relative FLOPS)
- Peak FLOPS
57SPEC Marks
Performance evaluation
- System Performance Evaluation Coorperative
- Non-profit group initially founded by APOLLO, HP,
MIPSCO, and SUN - Now includes many more like IBM, DEC, ATT,
MOTOROLA, etc - Measures the ratio of execution time on the
target measure to that on a VAX 11/780 - Summarizes performance by taking the geometric
means of the ratios
58SPEC95
Performance evaluation
- Eighteen application benchmarks (with inputs)
reflecting a technical computing workload - Eight integer
- go, m88ksim, gcc, compress, li, ijpeg, perl,
vortex - Ten floating-point intensive
- tomcatv, swim, su2cor, hydro2d, mgrid, applu,
turb3d, apsi, fppp, wave5 - Must run with standard compiler flags
- eliminate special undocumented incantations that
may not even generate working code for real
programs
59Metrics of performance
Performance evaluation
Answers per month Useful Operations per second
Application
Programming Language
Compiler
(millions) of Instructions per second
?MIPS (millions) of (F.P.) operations per second
?MFLOP/s
ISA
Datapath
Megabytes per second
Control
Function Units
Cycles per second (clock rate)
Transistors
Wires
Pins
Each metric has a place and a purpose, and each
can be misused
60Aspects of CPU Performance
Performance evaluation
61Criteria of Performance
Performance evaluation
- CPU Time
- (Instruction count) (CPI) (Clock Cycle)
- number of Instructions
- Clock Rate
- .
- Depends on technology and organization
- CPI
- Cycles Per Instruction
- Depends on organization and instruction set
- Instruction Count
- Depends on compiler and instruction set
second
cycle
cycle
instruction
cycle
seconds
62Criteria of Performance
Performance evaluation
- If CPI is not uniform across all instructions
- CPU cycles S (CPIi Ii)
- n - number of instructions in instruction set
- CPIi - CPI for instruction i
- Ii - number of times instruction i occurs in a
program - CPU Time S (CPIi Ii clock cycle)
- CPI
- It assumes that a given instruction always takes
the same number of cycles to execute
63Aspects of CPU Performance
Performance evaluation
64CPI
Performance evaluation
average cycles per instruction
CPI (CPU Time Clock Rate) / Instruction Count
Clock Cycles / Instruction Count
CPU time ? (Clock Cycle Time CPI i I I)
I i
CPI ? CPI i F i where F i
Instruction Count
"instruction frequency"
- Invest Resources where time is Spent!
65Example of RISC
Performance evaluation
Base Machine (Reg / Reg) Op Freq Cycles CPI(i)
Time ALU 50 1 .5 23 Load 20 5
1.0 45 Store 10 3 .3 14 Branch 20 2
.4 18 2.2
Typical Mix
How much faster would the machine be is a better
data cache reduced the average load time to 2
cycles? How does this compare with using branch
prediction to shave a cycle off the branch
time? What if two ALU instructions could be
executed at once?
66Amdahl's Law
Performance evaluation
- Speedup due to enhancement E
- ExTime w/o E
Performance w/ E - Speedup(E) --------------------
-------------------------- - ExTime w/ E Performance w/o E
- Suppose that enhancement E accelerates a fraction
F of the task by a factor S and the remainder of
the task is unaffected then, - ExTime(with E) ((1-F) F/S) X ExTime(without E)
- Speedup(with E) . 1 .
(1-F) F/S
67Cost
Performance evaluation
- Traditionally ignored by textbooks because of
rapid change - Driven by learning curve manufacturing costs
decrease with time - Understanding learning curve effects on yield is
key to cost projection - Yield
- Fraction of manufactured items that survive the
testing procedure - Testing and Packaging
- Big factors in lowering costs
68Cost
Performance evaluation
- Cost of Chips
- Cost
- Cost of die
- Wafer Yield dies / wafer
- Cost vs. Price
- Component cost 1533
- Direct cost 68
- Gross margin 3439
- Average discount 2540