RISC, CISC, Limitations - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

RISC, CISC, Limitations

Description:

... 15-stage fp pipeline, predictor accessed in fetch. Branch Penalties: ... Speculative execution increases parallelism by fetching, issuing, and completing ... – PowerPoint PPT presentation

Number of Views:161

Avg rating:3.0/5.0

Slides: 29

Provided by: bryand1

Category:

more less

Transcript and Presenter's Notes

Title: RISC, CISC, Limitations

1
RISC, CISC, Limitations Solutions

Bryan Duggan

2
Overview

RISC vs CISC
Characteristics of RISC CISC
Advantages Disadvantages of RISC CISC
Limitation of Von-Neuman Architecture
Solutions
Pipelining
Speculative Execution
Branch Prediction
Multi-processor Systems

3
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
(Vax, Intel 432 1977-80)
RISC
(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)
4
RISC/CISC

Complex Instruction Set Computer
Intel x86
DEC VAX, PDP11
Motorola 68k
IBM 360, 370
Complex instructions bring the hardware closer to
high-level languages
Memory was expensive
Fewer, more powerful instructions
Smaller programs
More space for data

5
CISC - ISA

Instruction Set Architecture
Addressing Modes
Additional Instructions
Procedure and function call
Procedure call overhead is significant
Registers (state of the processor) must be saved
and restored Mot 68k MOVEM x86 PUSH, POP
Array Indexing
y xijk VAX
Math functions
sqrt, sin, log, ... Intel x86 8087 Motorola
68k?
Yet more instructions!
Graphics support
MMX

6
CISC - ISA

Instruction count
Usually almost 256
Maximum number of 8-bit opcodes!
Powerful instructions
Many microcode steps
Multiple cycle latency
Faster in microcode than users program
Added some complexity to interrupt handling,
page faulting, etc
Instructions too long to be uninterruptible!
Variable length, multiple formats
1 to 17 bytes

7
CISC - ISA critique

Studies of compilers showed
Many instructions unused
DEC even dropped an indexed memory access,
post-decrement y xi-- from the ISA going
from PDP -gt VAX
Compiler writers were sometimes simply not using
complex instructions when they were appropriate
because they could write faster sequences of
simple instructions for the most common cases
Operand Constants
-15 to 15 56
-511 to 511 98
12 Words of storage for sub routines 95

8
CISC

Irrespective of its performance ...
Complex hardware is expensive
Speed improvements
Irregular (long design times)
Long lead times to market
Instruction set chip hardware become more
complex with each generation of computers.
Number of control words and number of clock
cycles vary between instructions. Difficult to
implement instruction pipelining.

9
80x86

1978 The Intel 8086 is announced (16 bit
architecture)
1980 The 8087 floating point coprocessor is
added
1982 The 80286 increases address space to 24
bits, instructions
1985 The 80386 extends to 32 bits, new
addressing modes
1989-1995 The 80486, Pentium, Pentium Pro add a
few instructions (mostly designed for higher
performance)
1997 MMX is addedThis history illustrates
the impact of the golden handcuffs of
compatibility

10
RISC Characteristics

No universally accepted definition
Most of the following
Instructions are conceptually simple
Instructions are of a uniform length
Instructions use one (or very few) instruction
formats
Instruction set is orthogonal
Little overlapping of instruction functionality
Instructions use very few addressing modes
Architecture is a load-and-store architecture
Only LOAD and STORE instructions reference memory
All operate instructions are register-to-register
The ISA supports few data types

11
RISC Characteristics, (Cont'd).

Other possible attributes
Almost all instructions execute in one clock
cycle
Implementation detail
Architecture takes advantage of strengths of
software
All reasonable architectures do
Architecture should have many registers
Not part of RISC
Useful, however, for speeding up CPU

12
Reduced Instruction Set Computer

No memory-memory instructions
Data loaded to registers
lw 3, 0(2)
Data stored from registers
st 4,40(5)
Arithmetic, logical, etc operations are all
Register -gt Register
Mostly 3-operand type
op dest_reg, src_regA, src_regB
Mostly 1-cycle in ALU
Throughput 1 instruction/cycle
Register Windows

13
RISC

Simplicity of RISC instructions
permits high clock rates
long-latency ALU instructions are divided further
as necessary
MIPS R4000 8-stage pipeline
All instructions 32-bits
Simplifies fetch

14
RISC - Simple Hardware, Complex Compiler

Basic hardware is simple
and hard-wired
ie no microcode
but
Pipeline stalls can reduce throughput
Optimising Compiler needed
Fully exploit capabilities
Dependence Analysis
Instruction re-ordering
Avoid pipeline stalls

15
RISC Disadvantages

A more sophisticated compiler is required.
A sequence of RISC instructions is needed to
implement complex instructions.
Require very fast memory systems to feed them
instructions.
Performance of a RISC application depend
critically on the quality of the code generated
by the compiler.

16
Von Neuman Limitation
17
Pipelining

Laundry Example
Ann, Brian, Cathy, Dave each have one load of
clothes to wash, dry, and fold
Washer takes 30 minutes
Dryer takes 40 minutes
Folder takes 20 minutes
How long to do laundry?

18
(No Transcript)
19
(No Transcript)
20
Pipelining Lessons

Pipelining doesnt help latency of single task,
it helps throughput of entire workload
Multiple tasks operating simultaneously using
different resources
Potential speedup Number pipe stages
Pipeline rate limited by slowest pipeline stage
Unbalanced lengths of pipe stages reduces speedup
Time to fill pipeline and time to drain it
reduces speedup
Stall for Dependences

6 PM
7
8
9
Time
T a s k O r d e r
21
How does it work?
IF
E
OF
OS
i
IF
E
OF
OS
I 1
IF
E
OF
OS
I 2
IF Instruction Fetch OF Operand Fetch E
Execute OS Operand Store
22
Pipeline Bubble
IF
E
OF
OS
i-1
SolutionAlways put an instruction after a
branch, even if it is a NOOP
IF BRA N
E
OF
OS
i
IF
E
OF
OS
i1
IF
E
OF
OS
i 2
IF
E
OF
OS
N
IF
E
OF
OS
N 1
BUBBLE
i
N
N 1
N 3
N 2
N 4
N 5
N 6
i-1
23
Data Dependency

Consider
ADD A, B, Temp1SUB Temp1, C,
Temp2AND Temp1, Temp2, X
Generates bubbles

24
Branch Prediction

Predicting the outcome of a branch
Conditional/Unconditional
Direction
Taken / Not Taken
Direction predictors
Target Address
PCoffset (Taken)/ PC4 (Not Taken)
Why do we need branch prediction?
Increases the number of instructions available
for the scheduler to issue. Increases
instruction level parallelism (ILP)
Allows useful work to be completed while waiting
for the branch to resolve

25
Branch Prediction Strategies

Static
Decided before runtime
Examples
Always-Not Taken
Always-Taken
Backwards Taken, Forward Not Taken (BTFNT)
Profile-driven prediction
Dynamic
Prediction decisions may change during the
execution of the program
AMD Athlon K7
10-stage integer, 15-stage fp pipeline, predictor
accessed in fetch
Branch Penalties
Correct Predict Taken 1 cycle
Mispredict penalty at least 10 cycles

26
Speculative Execution

Speculative execution increases parallelism by
fetching, issuing, and completing instructions
even in the presence of unresolved conditional
branches and possible exceptions.

27
Multi Processors