Title: William Stallings Computer Organization and Architecture
1William Stallings Computer Organization and
Architecture
- Chapter 2Computer Evolution and Performance
2Topics
- History of Computers
- Designing for Performance
- Performance Measurement
3History of Computers (1)
- Pre-mechanical Era
- Abacus (ancient China)
- Mechanical Era (1623 1940s)
- Wilhelm Schickhard (1623)
- Automatically , -, x, ?
- Blaise Pascal (1642)
- Mass produced first working machine (50)
- Only , -
- Gottfried Liebniz (1673)
- Improved on Pascals machine (, -, x, ?)
4History of Computers (2)
- Mechanical Era (contd)
- Charles Babbage (1822)
- Father of modern computer
- Automatic computation of math tables
- Any math operation
- Punch cards
- Modern structure I/O, storage, ALU
- 1 sec. x 1 min.
- George Boole (1847)
- Mathmatical analysis of logic
5History of Computers (3)
- Mechanical Era (contd)
- Herman Hollerith (1889)
- Modern day punch card machine
- Tabulating machine company ?became
- Konard Zuse (1938)
- First working mechanical computer, Z1
- Binary machine
- Howard Aiken (1943)
- Harvard Mark I, built by IBM
- Implementation of Babbages machine
IBM
6History of Computers (4)
- Summary of Mechanical Era
- Contributions
- Reduce calculation time
- Increase accuracy
- Drawback
- Speed limited by moving parts
- Cumbersome
- Expensive
- Unreliable
- Entered the Electronic Era (1945 present)!!
7ENIAC - background
- Electronic Numerical Integrator And Computer
- Eckert and Mauchly
- University of Pennsylvania
- Trajectory tables for weapons
- Started 1943
- Finished 1946
- Too late for war effort (Quiz When did WWII
end?) - Used until 1955
8ENIAC details (Its BIG)
- Decimal (not binary)
- 20 accumulators of 10 digits
- Programmed manually by switches
- 18,000 vacuum tubes
- 70,000 resistors
- 10,000 capacitors
- 6,000 switches
- 30 tons
- 15,000 square feet
- 140 KW power consumption
- 5,000 additions per second
9von Neumann/Turing
- Stored Program concept
- Main memory storing programs and data
- ALU operating on binary data
- Control unit interpreting instructions from
memory and executing - Input and output equipment operated by control
unit - Princeton Institute for Advanced Studies
- IAS
- Completed 1952 Basis for virtually every machine
designed since then
10Structure of von Neumann machine
11IAS - details
- 1000 x 40 bit words
- Binary number
- 2 x 20 bit instructions
- Set of registers (storage in CPU)
- Memory Buffer Register
- Memory Address Register
- Instruction Register
- Instruction Buffer Register
- Program Counter
- Accumulator
- Multiplier Quotient
12IAS von Neumann Architecture
- Features
- Data and instructions are stored in a single R/W
memory - Memory contents are addressable by location,
regardless of the content - Sequential execution
13Structure of IAS detail
14Commercial Computers
- 1947 - Eckert-Mauchly Computer Corporation
- UNIVAC I (Universal Automatic Computer)
- US Bureau of Census 1950 calculations
- Became part of Sperry-Rand Corporation
- Late 1950s - UNIVAC II
- Faster
- More memory
15IBM
- Punched-card processing equipment
- 1953 - the 701
- IBMs first stored program computer
- Scientific calculations
- 1955 - the 702
- Business applications
- Lead to 700/7000 series
16Transistors
- Replaced vacuum tubes
- Smaller
- Cheaper
- Less heat dissipation
- Solid State device
- Made from Silicon (Sand)
- Invented 1947 at Bell Labs
- William Shockley et al.
17Transistor Based Computers
- Second generation machines
- NCR RCA produced small transistor machines
- IBM 7000
- DEC - 1957
- Produced PDP-1
- High level languages
- Floating point arithmetic
18Microelectronics
- Literally - small electronics
- A computer is made up of gates, memory cells and
interconnections - These can be manufactured on a semiconductor
- e.g. silicon wafer
19Generations of Computer
- First generation Vacuum tube - 1946-1957
- Second generation Transistor - 1958-1964
- Third generation Integrated circuits 1965
1971 - Small scale integration - 1965 on
- Up to 100 devices on a chip
- Medium scale integration - to 1971
- 100-3,000 devices on a chip
- Semiconductor memory (1970)
- Microprocessor (1971)
20Generations of Computer
- Fourth generation Large scale integration (LSI)
- 1971-1977 - 3,000 - 100,000 devices on a chip
- Intel 8080 first general-purpose microprocessor
(1974) - Fifth generation 1978 present
- Very large scale integration (VLSI) - 1978 to
date - 100,000 - 100,000,000 devices on a chip
- Ultra large scale integration (ULSI)
- Over 100,000,000 devices on a chip
- GSI ??
21Moores Law
- Increased density of components on chip
- Gordon Moore - cofounder of Intel
- Number of transistors on a chip will double every
year - Since 1970s development has slowed a little
- Number of transistors doubles every 18 months
- Cost of a chip has remained almost unchanged
- Higher packing density means shorter electrical
paths, giving higher performance - Smaller size gives increased flexibility
- Reduced power and cooling requirements
- Fewer interconnections increases reliability
22Growth in CPU Transistor Count
23IBM 360 series
- 1964
- Replaced ( not compatible with) 7000 series
- First planned family of computers
- Similar or identical instruction sets
- Similar or identical O/S
- Increasing speed
- Increasing number of I/O ports (i.e. more
terminals) - Increased memory size
- Increased cost
- Multiplexed switch structure
24DEC PDP-8
- 1964
- First minicomputer (after miniskirt!)
- Did not need air conditioned room
- Small enough to sit on a lab bench
- 16,000
- 100k for IBM 360
- Embedded applications OEM
- BUS STRUCTURE
25DEC - PDP-8 Bus Structure
I/O Module
Main Memory
I/O Module
Console Controller
CPU
OMNIBUS
26Semiconductor Memory
- 1970
- Fairchild
- Size of a single core
- i.e. 1 bit of magnetic core storage
- Holds 256 bits
- Non-destructive read
- Much faster than core
- Capacity approximately doubles each year
27Intel
- 1971 - 4004
- First microprocessor
- All CPU components on a single chip
- 4 bit
- Followed in 1972 by 8008
- 8 bit
- Both designed for specific applications
- 1974 - 8080
- Intels first general purpose microprocessor
28Designing for Performance (1)
- Support-Demand Cycle
- Computer Performance
- Demands Supports
- (Motivates)
- Application Requirement
29Designing for Performance (2)
- Performance balance
- The rate at which performance is changing in the
various technology areas (processor, buses,
memory, peripherals) differs greatly from one
type of elements to another. - New applications and new peripheral devices
constantly change the nature of the demand on the
system.
30Speeding it up (Processor)
- Pipelining
- On board cache
- On board L1 L2 cache
- Branch prediction
- Data flow analysis
- Speculative execution
31Performance Mismatch
- Processor speed increased
- Memory capacity increased
- Memory speed lags behind processor speed
32DRAM and Processor Characteristics
33Trends in DRAM use
34Solutions
- Increase number of bits retrieved at one time
- Make DRAM wider rather than deeper
- Change DRAM interface
- Cache
- Reduce frequency of memory access
- More complex cache and cache on chip
- Increase interconnection bandwidth
- High speed buses
- Hierarchy of buses
35Pentium Evolution (1)
- 8080
- first general purpose microprocessor
- 8 bit data path
- Used in first personal computer Altair
- 8086
- much more powerful
- 16 bit
- instruction cache, prefetch few instructions
- 8088 (8 bit external bus) used in first IBM PC
- 80286
- 16 Mbyte memory addressable
- up from 1Mb
- 80386
- 32 bit
- Support for multitasking
36Pentium Evolution (2)
- 80486
- sophisticated powerful cache and instruction
pipelining - built in maths co-processor
- Pentium
- Superscalar
- Multiple instructions executed in parallel
- Pentium Pro
- Increased superscalar organization
- Aggressive register renaming
- branch prediction
- data flow analysis
- speculative execution
37Pentium Evolution (3)
- Pentium II
- MMX technology
- graphics, video audio processing
- Pentium III
- Additional floating point instructions for 3D
graphics - Pentium 4
- Note Arabic rather than Roman numerals
- Further floating point and multimedia
enhancements - Itanium
- 64 bit
- see chapter 15
- See Intel web pages for detailed information on
processors
38Performance Measurement (1)
- Performance
- Execution time (latency)
- Time between the start and the completion of an
event - Performance ? 1/(Execution time)
- Throughput (bandwidth)
- Total amount of work done in a given time
- Machine X is n faster than Machine Y
39Performance Measurement (2)
- Example
- Machine A runs a program in 10 seconds,
- Machine B runs the same program in 15 seconds,
- A is __ faster than B.
40Performance Measurement (3)
- Improve performance ? Increase performance
- Improve execution time ? Decrease execution time
- Question Can we improve performance 10 times
faster by using a 10-time-faster machine?
41Amdahls Law (1)
- The performance improvement to be gained from
using some faster mode of execution is limited by
the fraction of the time the faster mode can be
used. - It defines the speedup can be gained by using a
particular enhancement.
42Amdahls Law (2)
- Speedup
-
- Performance for entire task using the
enhancement when possible - Performance for entire task w/o using the
enhancement -
- Execution time for entire task w/o using the
enhancement - Execution time for entire task using the
enhancement when possible
43Amdahls Law (3)
- Execution timenew
- Execution timeold x
-
- where fE fraction of enhancement
- sE improvement gained by the
- enhancement mode
-
- ? Speedup
44Amdahls Law (4)
- Example An enhancement run 10 times faster than
the original machine, but it is usable 40 of the
time, then the speedup __. - SolfE 0.4
- sE 10
- ? Speedup 1/((1-0.4) 0.4/10)
- 1.56
45Amdahls Law (5)
- Extreme Cases
- fE 0 ? Speedup 1
- fE 1 ? Speedup sE
-
46CPU Performance (1)
- Most computers are constructed using a clock
running at a constant rate - Distinct time events
- ? ticks ? clock ticks ? clocks
- ? cycles ? clock periods ? clock cycles
- Referred to by
- length/time, e.g., 10 ns, or
- rate, e.g., 100 MHz
- ms 103 sec, ?s 106 sec, ns 109 sec
- Hz 1/sec, KHz 103 Hz, MHz 106 Hz, GHz 109
Hz
Clock cycle time 1/ clock rate
47CPU Performance (2)
- CPI clock cycle per instruction
- CPU time for a program
- CPU clock cycles for a program x clock cycle
time
48CPU Performance (3)
- CPI x Instruction Count x 1/(clock rate)
- CPU time
- BUT, not every instruction takes the same number
of clock cycles to execute. ? Take the average.
49CPU Performance (4)
- CPI
-
- n number of different instructions in a program
- CPIi CPI of instruction i
- fi frequency of instruction i in a program
- Example
- Operations frequency clock cycle
- ADD 60 1
- LOAD 40 2
- CPIoverall _____
50Improve CPU Performance (1)
- How do we improve CPU performance (i.e., reduce
CPU time)? - Again, CPU time
- CPI x Instruction Count x 1/(clock
rate) - So, we want to _____ CPI
- _____ Instruction Count
- _____ clock rate
- _____ clock cycle time
51Improve CPU Performance (2)
- Clock rate
- HW technology
- Organization
- CPI
- Organization
- Instruction set architecture
- Instruction Count
- Instruction set architecture
- Compiler technology
52MIPS (1)
- MIPS Million Instruction Per Second
- MIPS
53MIPS (2)
- Given MIPS,
- ? MIPS ? Execution time ?
- Performance ?
54MIPS (3)
- Advantage
- Easy to understand (especially by customers)
- Disadvantages
- Difficult to compare MIPS of computers with
different instruction sets - MIPS varies between programs on the same computer
- MIPS can vary inversely to performance
55Other Measurements
- MFLOPS
- Millions of floating point operations per second
- Cost
56Internet Resources
- http//www.intel.com/
- Search for the Intel Museum
- http//www.ibm.com
- http//www.dec.com
- Charles Babbage Institute
- PowerPC
- Intel Developer Home