Title: RISC, CISC, Limitations
1RISC, CISC, Limitations Solutions
2Overview
- RISC vs CISC
- Characteristics of RISC CISC
- Advantages Disadvantages of RISC CISC
- Limitation of Von-Neuman Architecture
- Solutions
- Pipelining
- Speculative Execution
- Branch Prediction
- Multi-processor Systems
3Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
(Vax, Intel 432 1977-80)
RISC
(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)
4RISC/CISC
- Complex Instruction Set Computer
- Intel x86
- DEC VAX, PDP11
- Motorola 68k
- IBM 360, 370
- Complex instructions bring the hardware closer to
high-level languages - Memory was expensive
- Fewer, more powerful instructions
- Smaller programs
- More space for data
5CISC - ISA
- Instruction Set Architecture
- Addressing Modes
- Additional Instructions
- Procedure and function call
- Procedure call overhead is significant
- Registers (state of the processor) must be saved
and restored Mot 68k MOVEM x86 PUSH, POP - Array Indexing
- y xijk VAX
- Math functions
- sqrt, sin, log, ... Intel x86 8087 Motorola
68k? - Yet more instructions!
- Graphics support
- MMX
6CISC - ISA
- Instruction count
- Usually almost 256
- Maximum number of 8-bit opcodes!
- Powerful instructions
- Many microcode steps
- Multiple cycle latency
- Faster in microcode than users program
- Added some complexity to interrupt handling,
page faulting, etc - Instructions too long to be uninterruptible!
- Variable length, multiple formats
- 1 to 17 bytes
7CISC - ISA critique
- Studies of compilers showed
- Many instructions unused
- DEC even dropped an indexed memory access,
post-decrement y xi-- from the ISA going
from PDP -gt VAX - Compiler writers were sometimes simply not using
complex instructions when they were appropriate - because they could write faster sequences of
simple instructions for the most common cases - Operand Constants
- -15 to 15 56
- -511 to 511 98
- 12 Words of storage for sub routines 95
8CISC
- Irrespective of its performance ...
- Complex hardware is expensive
- Speed improvements
- Irregular (long design times)
- Long lead times to market
- Instruction set chip hardware become more
complex with each generation of computers. - Number of control words and number of clock
cycles vary between instructions. Difficult to
implement instruction pipelining.
980x86
- 1978 The Intel 8086 is announced (16 bit
architecture) - 1980 The 8087 floating point coprocessor is
added - 1982 The 80286 increases address space to 24
bits, instructions - 1985 The 80386 extends to 32 bits, new
addressing modes - 1989-1995 The 80486, Pentium, Pentium Pro add a
few instructions (mostly designed for higher
performance) - 1997 MMX is addedThis history illustrates
the impact of the golden handcuffs of
compatibility
10RISC Characteristics
- No universally accepted definition
- Most of the following
- Instructions are conceptually simple
- Instructions are of a uniform length
- Instructions use one (or very few) instruction
formats - Instruction set is orthogonal
- Little overlapping of instruction functionality
- Instructions use very few addressing modes
- Architecture is a load-and-store architecture
- Only LOAD and STORE instructions reference memory
- All operate instructions are register-to-register
- The ISA supports few data types
11RISC Characteristics, (Cont'd).
- Other possible attributes
- Almost all instructions execute in one clock
cycle - Implementation detail
- Architecture takes advantage of strengths of
software - All reasonable architectures do
- Architecture should have many registers
- Not part of RISC
- Useful, however, for speeding up CPU
12Reduced Instruction Set Computer
- No memory-memory instructions
- Data loaded to registers
- lw 3, 0(2)
- Data stored from registers
- st 4,40(5)
- Arithmetic, logical, etc operations are all
- Register -gt Register
- Mostly 3-operand type
- op dest_reg, src_regA, src_regB
- Mostly 1-cycle in ALU
- Throughput 1 instruction/cycle
- Register Windows
13RISC
- Simplicity of RISC instructions
- permits high clock rates
- long-latency ALU instructions are divided further
as necessary - MIPS R4000 8-stage pipeline
- All instructions 32-bits
- Simplifies fetch
14RISC - Simple Hardware, Complex Compiler
- Basic hardware is simple
- and hard-wired
- ie no microcode
- but
- Pipeline stalls can reduce throughput
- Optimising Compiler needed
- Fully exploit capabilities
- Dependence Analysis
- Instruction re-ordering
- Avoid pipeline stalls
15RISC Disadvantages
- A more sophisticated compiler is required.
- A sequence of RISC instructions is needed to
implement complex instructions. - Require very fast memory systems to feed them
instructions. - Performance of a RISC application depend
critically on the quality of the code generated
by the compiler.
16Von Neuman Limitation
17Pipelining
- Laundry Example
- Ann, Brian, Cathy, Dave each have one load of
clothes to wash, dry, and fold - Washer takes 30 minutes
- Dryer takes 40 minutes
- Folder takes 20 minutes
- How long to do laundry?
18(No Transcript)
19(No Transcript)
20Pipelining Lessons
- Pipelining doesnt help latency of single task,
it helps throughput of entire workload - Multiple tasks operating simultaneously using
different resources - Potential speedup Number pipe stages
- Pipeline rate limited by slowest pipeline stage
- Unbalanced lengths of pipe stages reduces speedup
- Time to fill pipeline and time to drain it
reduces speedup - Stall for Dependences
6 PM
7
8
9
Time
T a s k O r d e r
21How does it work?
IF
E
OF
OS
i
IF
E
OF
OS
I 1
IF
E
OF
OS
I 2
IF Instruction Fetch OF Operand Fetch E
Execute OS Operand Store
22Pipeline Bubble
IF
E
OF
OS
i-1
SolutionAlways put an instruction after a
branch, even if it is a NOOP
IF BRA N
E
OF
OS
i
IF
E
OF
OS
i1
IF
E
OF
OS
i 2
IF
E
OF
OS
N
IF
E
OF
OS
N 1
BUBBLE
i
N
N 1
N 3
N 2
N 4
N 5
N 6
i-1
23Data Dependency
- Consider
- ADD A, B, Temp1SUB Temp1, C,
Temp2AND Temp1, Temp2, X - Generates bubbles
24Branch Prediction
- Predicting the outcome of a branch
- Conditional/Unconditional
- Direction
- Taken / Not Taken
- Direction predictors
- Target Address
- PCoffset (Taken)/ PC4 (Not Taken)
- Why do we need branch prediction?
- Increases the number of instructions available
for the scheduler to issue. Increases
instruction level parallelism (ILP) - Allows useful work to be completed while waiting
for the branch to resolve
25Branch Prediction Strategies
- Static
- Decided before runtime
- Examples
- Always-Not Taken
- Always-Taken
- Backwards Taken, Forward Not Taken (BTFNT)
- Profile-driven prediction
- Dynamic
- Prediction decisions may change during the
execution of the program - AMD Athlon K7
- 10-stage integer, 15-stage fp pipeline, predictor
accessed in fetch - Branch Penalties
- Correct Predict Taken 1 cycle
- Mispredict penalty at least 10 cycles
26Speculative Execution
- Speculative execution increases parallelism by
fetching, issuing, and completing instructions
even in the presence of unresolved conditional
branches and possible exceptions.
27Multi Processors
- MIMD
- Multiple Instruction stream, Multiple Data stream
- MIMD is often SPMD (Single Program Multiple Data)
- Processors independently execute programs
- Interactions between processors are costly
- Flexible, can do this here, that there
- Debugging is complicated by races and lack of
repeatability - SMP is Symmetric MultiProcessor
- Multiple processors as interchangeable peers
- SMP usually implies
- MIMD execution mode
- Shared memory
- Some problems are inherently sequential
28Summary
- RISC
- CISC
- Limitations of Von Neumann
- Pipelining
- Branch prediction
- Speculative Execution
- Multi-processor systems