Title: Chapter 13 Reduced Instruction Set Computers (RISC)
1Chapter 13Reduced Instruction Set Computers
(RISC)
- CISC Complex Instruction Set Computer
- RISC Reduced Instruction Set Computer
2Some Major Advances in Computers in 50 years
- VLSI
- The family concept
- Microprogrammed control unit
- Cache memory
- MiniComputers
- Microprocessors
- Pipelining
- PCs
- Multiple processors
- RISC processors
3RISC
- Reduced Instruction Set Computer
- Key features
- Large number of general purpose registers
- (or use of compiler technology to optimize
register use) - Limited and simple instruction set
- Emphasis on optimising the instruction pipeline
memory management
4Comparison of processors
5Driving force for CISC
- Software costs far exceed hardware costs
- Increasingly complex high level languages
- A Semantic gap between HHL ML
- This Leads to
- Large instruction sets
- More addressing modes
- Hardware implementations of HLL statements
- e.g. CASE (switch) on VAX (long, complex
structure)
6Intention of CISC
- Ease compiler writing
- Improve execution efficiency
- Complex operations in microcode
- Support more complex HLLs
7Execution Characteristics Studied
- What was studied?
- Operations performed
- Operands used
- Execution sequencing
- How was it Studied?
- Studies was done based on programs written in
HLLs - Dynamic studies measured during the execution of
the program
8Operations
- Assignments
- Movement of data
- Conditional statements (IF, LOOP)
- Sequence control
- Observations?
- Procedure call-return is very time consuming
- Some HLL instructions lead to very many machine
code operations
9Weighted Relative Dynamic Frequency of HLL
Operations Patterson
10Operands
- Observations?
- Predominately local scalar variables
- Implications?
- Optimization should concentrate on accessing
local variables
11Procedure Calls
- Observations?
- Context switching is quite time consuming
- Depends on number of parameters passed
- Depends on level of nesting
- Most programs do not do a lot of calls followed
by lots of returns - Most variables used are local
12Implications ? Characterize RISC
- Best support is provided by optimising
- most utilized features and
- most time consuming features
- Conclusions
- Large number of registers
- Used for operand referencing
- Careful design of pipelines
- Address branching - Branch prediction etc.
- Simplified instruction set
- Reduced length
- Reduced number
13Register File
- Software solution
- Require compiler to allocate registers
- Allocate based on most used variables in a given
time - ?Requires sophisticated program analysis
- Hardware solution
- Have more registers
- ? Thus more variables will be in registers
14Using Registers for Local Variables
- Store local scalar variables in registers
- Reduces memory accesses
- Every procedure (function) call changes locality
- Parameters must be passed
- Partial context switch
- Results must be returned
- Variables from calling program must be restored
- Partial Context switch
15Using Register Windows
- Observations
- Typically only few local Pass parameters
- Typically limited range of depth of calls
- Implications
- Use multiple small sets of registers
- Calls switch to a different set of registers
- Returns switch back to a previously used set of
registers - Partition register set
16Using Register Windows cont.
- Partition register set into
- Local registers
- Parameter registers (Passed Parameters)
- Temporary registers (Passing Parameters)
- Then
- Temporary registers from one set overlap
parameter registers from the next - ? This provides parameter passing without moving
data (just move one pointer)
17Overlapping Register Windows
Picture of Calls Returns
18Circular Buffer diagram
19Operation of Circular Buffer
- When a call is made, a current window pointer is
moved to show the currently active register
window - If all windows are in use, an interrupt is
generated and the oldest window (the one furthest
back in the call nesting) is saved to memory - A saved window pointer indicates where the next
saved windows should be restored
20Global Variables
- How should we accommodate Global Variables?
- Allocate by the compiler to memory
- Have a static set of registers for global
variables - Put them in cache
21Registers v Cache which is better?
22Referencing a Scalar - Window Based Register File
23Referencing a Scalar - Cache
24Compiler Based Register Optimization
- Basis
- Assuming relatively small number of registers
(16-32) - Optimizing the use is up to compiler
- HLL programs have no explicit references to
registers - Process
- Assign symbolic or virtual register to each
candidate variable - Map (unlimited) symbolic registers to (limited)
real registers - Symbolic registers that do not overlap can share
real registers - If you run out of real registers some variables
use memory
25Graph Coloring Algorithm for Reg Assign
- Given
- A graph of nodes and edges
- Nodes are symbolic registers
- Two symbolic registers that are live in the same
program fragment are joined by an edge - Then
- Assign a color to each node
- Adjacent nodes must have different colors
- Assign minimum number of colors
- And then
- Try to color the graph with n colors, where n is
the number of real registers - Nodes that can not be colored are placed in memory
26Graph Coloring Algorithm Example
27The debate Why CISC (1 of 2)?
- Compiler simplification?
- Dispute
- - Complex machine instructions are harder
to exploit - - Optimization actually may be more
difficult - Smaller programs? (Memory is now cheap)
- Programs may take up less instructions, but
- May not occupy less memory,
- just look shorter in symbolic form
- More instructions require longer op-codes, more
memory references - Register references require fewer bits
28The Debate Why CISC (2 of 2)?
- Faster programs?
- More complex control unit
- Microprogram control store larger
- ? Thus instructions take longer to execute
- Bias towards use of simpler instructions ?
- It is far from clear that CISC is the appropriate
solution
29Early RISC Computers
- MIPS Microprocessor without Interlocked
Pipeline Stages - Stanford (John Hennessy)
- MIPS Technology
- SPARC Scalable Processor Architecture
- Berkeley (David Patterson)
- Sun Microsystems
- 801 IBM Research (George Radin)
30Concentrating on RISC
- Major Characteristics
- One instruction per cycle
- Register to register operations
- Few, simple addressing modes
- Few, simple instruction formats
- Also
- Hardwired design (no microcode)
- Fixed instruction format
- But
- More compile time/effort
31Breadth of RISC Characteristics
32Characteristics of Example Processors
33Memory to memory vs Register to memory Operations
Lab Project 1
34Controversy CISC vs RISC
- Challenges of comparison
- There are no pair of RISC and CISC that are
directly comparable - There are no definitive set of test programs
- It is difficult to separate hardware effects from
complier effects - Most comparisons are done on toy rather than
production machines - Most commercial machines are a mixture
35Controversy RISC v CISC
- Not clear cut
- Todays designs borrow from both philosophies
36RISC Pipelining basics
- Two phases of execution for register based
instructions - I Instruction fetch
- E Execute
- ALU operation with register input and output
- For load and store there need to be three
- I Instruction fetch
- E Execute
- Calculate memory address
- D Memory
- Register to memory or memory to register operation
37Effects of RISC Pipelining
(Allows 2 memory accesses per stage)
(E1 register read, E2 execute register
write Particularly beneficial if E phase can be
longer)
38Optimization of RISC Pipelining
- Delayed branch
- Leverages branch that does not take effect until
after execution of following instruction - This, following instruction becomes the delay
slot
39Normal vs Delayed Branch
40Example of Delayed Branch (cleaver!)
What is wrong with this example? Why is there a
Write back
41More Options for RISC Architectures
- RISC Pipelining
- Superpipelined more fine grained pipeline
- (more stages in
pipeline) - Superscaler replicates stages of pipeline
- (multiple
pipelines)
42MIPS 4000 RISC Machine
- 64 bit architecture (4 Gig address space)
- (1 Terabyte of file mapping)
- Partitioned into CPU MMU
- 32 registers (R00), but
- 128K Cache ½ Instructions, ½ Data
- One 32 bit word for each instruction (94
Instructions) - All operations are register to register
- No condition codes! Flags are stored in general
registers for explicit use simplifies branch
optimization - Only on load/Store Format Base, Offset
- extended addressing synthesized with multiple
instructions - Uses branch prediction
- Especially designed for Embedded computing
- Has multiple FPUs FP likely stalls pipeline
43MIPS 4000 RISC Machine
- See MIPS instructions - page 485
- See Formats page 486
44(No Transcript)