William Stallings Computer Organization and Architecture - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

William Stallings Computer Organization and Architecture

Description:

... and Patterson wrote a series of papers that defined the RISC movement and set ... Calls switch to a different set of registers ... – PowerPoint PPT presentation

Number of Views:394

Avg rating:3.0/5.0

Slides: 51

Provided by: adrianjpul6

Category:

more less

Transcript and Presenter's Notes

Title: William Stallings Computer Organization and Architecture

1
William Stallings Computer Organization and
Architecture

Chapter 13
Reduced Instruction
Set Computers

2
Topics

Major Advances in Computers
Instruction Execution Characteristics
Use of Large Register File
Compiler-Based Register Optimization
Reduced Instruction Set Architecture
RISC Pipelining
RISC vs. CISC Controversy

3
Major Advances in Computers(1)

The family concept
IBM System/360 1964
DEC PDP-8
Separates architecture from implementation
Microprogrammed control unit
Idea by Wilkes 1951
Produced by IBM S/360 1964
Ease the task of designing and implementing
control unit

4
Major Advances in Computers(2)

Cache memory
IBM S/360 model 85 1969
Solid State RAM
(See memory notes)
Microprocessors
Intel 4004 1971
Pipelining
Introduces parallelism into fetch execute cycle
Multiple processors

5
The Next Step - RISC

Reduced Instruction Set Computer
Key features
Large number of general purpose registers
or
Use of compiler technology to optimize register
use
Limited and simple instruction set
Emphasis on optimizing the instruction pipeline

6
History of RISC

IBM 801 project late 70s early 80s
David Patterson, UC Berkeley
RISC I and RISC II
Large register sets
Forerunner of SPARC architecture
John Hennessy, Stanford U.
MIPS system
Optimizing compiler and pipelines
Hennessy and Patterson wrote a series of papers
that defined the RISC movement and set the stage
for the ongoing RISC vs. CISC debate

7
Comparison of processors
8
Driving force for CISC

Software costs far exceed hardware costs
Increasingly complex high level languages
Semantic gap Difference between operations
provided in HLLs and those provided in computer
architecture
Leads to
Large instruction sets
More addressing modes
Hardware implementations of HLL statements
e.g. CASE (switch) on VAX
to close the gap.

9
Intention of CISC

Ease compiler writing
Improve execution efficiency
As complex operations can be implemented in
microcode
Support more complex HLLs
A totally different approach
Simpler architecture

10
Execution Characteristics

Developments of RISCs were based on the study of
instruction execution characteristics
Operations performed
determine functions to be performed and
interaction with memory
Operands Used (types and frequencies)
determine memory organization and addressing
modes
Execution sequencing
determines the control and pipeline organization

11
Execution Characteristics

Studies have been done based on programs written
in HLLs
Dynamic studies are measured during the execution
of the program

12
Operations

Assignments
Movement of data
Conditional statements (IF, LOOP)
Sequence control
Procedure call-return is very time consuming
Some HLL instruction lead to many machine code
operations

13
Relative Dynamic Frequency

Dynamic Machine Instruction Memory Reference
Occurrence (Weighted) (Weighted)
Pascal C Pascal C Pascal C
Assign 45 38 13 13 14 15
Loop 5 3 42 32 33 26
Call 15 12 31 33 44 45
If 29 43 11 21 7 13
GoTo - 3 - - - -
Other 6 1 3 1 2 1

14
Operands

Mainly local scalar variables
Optimization should concentrate on accessing
local variables
Pascal C Average
Integer constant 16 23 20
Scalar variable 58 53 55
Array/structure 26 24 25

15
Procedure Calls (1)

Very time consuming
Depends on number of parameters passed
Depends on level of nesting
? depth of nesting typically low
Most programs do not do a lot of calls followed
by lots of returns
Most variables are local
(c.f. locality of reference)

16
Procedure Calls (2)

Tanenbaums study
98 of calls pass fewer than 6 arguments
92 use fewer than 6 local scalar variables
Berkeley RISC teams study
Percentage of Executed Compiler,
Interpreter, Small Nonnumeric
Procedure Calls With and Tyepsetter Programs
gt 3 arguments 0-7 0-5
gt 5 arguments 0-3 0
gt 8 words of arguments 1-20 0-6
local scalars
gt 12 words of arguments and 1-6 0-3
local scalars

17
Implications

Making instruction set architecture close to HLL
? not most effective
Best support is given by optimizing most used and
most time consuming features
Large number of registers
Operand referencing optimization locality of
references ? memory references reduced
Careful design of pipelines
Branch prediction etc.
Simplified (reduced) instruction set

18
Approaches

Hardware solution
Have more registers
Thus more variables will be in registers
e.g., Berkeley RISC, SUN SPARC
Software solution
Require compiler to allocate registers
Allocate based on most used variables in a given
time
Require sophisticated program analysis
e.g., Stanford MIPS

19
Use of Large Register File

From the analysis
Large number of assignment statements
Most accesses to local scalars
? Heavy reliance on register storage
? Minimizing memory access

20
Registers for Local Variables

Store local scalar variables in registers
? Reduces memory access
Every procedure (function) call changes locality
Parameters must be passed
Results must be returned
Variables from calling programs must be restored
Solution register windows

21
Register Windows (1)

Register windows
Organization of registers to realize the goal
From the analysis
Only few parameters
Limited range of depth of call
?
Use multiple small sets of registers
Calls switch to a different set of registers
Returns switch back to a previously used set of
registers

22
Register Windows (2)

Three areas within a register set
Parameter registers
Local registers
Temporary registers
Temporary registers from one set overlap
parameter registers from the next
This allows parameter passing without moving data

23
Overlapping Register Windows
24
Circular Buffer Diagram
Actual Organization
25
Operation of Circular Buffer (1)

When a call is made, a current window pointer
(CWP) is moved to show the currently active
register window
If all windows are in use, an interrupt is
generated and the oldest window (the one furthest
back in the call nesting) is saved to memory
(only .in and .loc need to be saved)
A saved window pointer indicates where the next
saved windows should restore to

26
Operation of Circular Buffer (2)

Studies show 8 windows are enough to handle up
to 99 of call/return without save/restore
E.g., Berkeley RISC uses 8 windows of 16
registers each

27
Global Variables - 2 Options

Allocated by the compiler to memory
Straightforward
Inefficient for frequently accessed variables
Have a set of registers for global variables
e.g., registers 0 - 7 global
8 - 31 local to current window
Increased hardware burden
Compiler must decide which global variables
should be designed to registers

28
SPARC RegisterWindows
29
SPARC RegisterWindows
30
Registers vs. Cache

Large Register File Cache
- All local scalars - Recently used local
scalars
- Individual variables - Blocks of memory
- Compiler assigned - Recently used global
variables
global variables
- Save/restore based on - Save/restore based on
procedure nesting caching algorithm
- Register addressing - Memory addressing

31
Referencing a Scalar - Window Based Register File
virtual register number
window number
32
Referencing a Scalar - Cache
33
Compiler Based Register Optimization

Assume small number of registers (16-32)
? Optimizing use is up to compiler
HLL programs have no explicit references to
registers
Assign symbolic or virtual register to each
candidate variable
Map (unlimited) symbolic registers to real
registers
Symbolic registers that do not overlap in time
can share real registers
If you run out of real registers some variables
use memory

34
Graph Coloring (1)

Given a graph of nodes and edges
Assign a color to each node
Adjacent nodes have different colors
Use minimum number of colors
Nodes are symbolic registers
Two registers that are live in the same program
fragment are joined by an edge
Try to color the graph with n colors, where n is
the number of real registers

35
Graph Coloring (2)

Nodes that can not be colored are placed in
memory
Formally, register interference graph G (V, E),
where
V symbolic registers
E vivj vi, vj ? V and vi, vj active at the
same time
Studies show
64 registers are enough with simple register
optimization
32 registers are enough with sohisticated
register optimization

36
Graph Coloring Approach
Time
37
Reduced Instruction Set Architecture (1)

Why CISC?
Compiler simplification?
Disputed
Complex machine instructions harder to exploit
Optimization more difficult
Smaller programs?
Program takes up less memory but
Memory is now cheap
May not occupy less bits, just look shorter in
symbolic form
More instructions require longer op-codes
Register references require fewer bits

38
Reduced Instruction Set Architecture (2)

Why CISC (contd)
Faster programs?
More complex control unit
? Larger microprogram control store
? Simple instructions take longer to execute
BUT, bias towards use of simpler instructions
It is far from clear that CISC is the appropriate
solution

39
Reduced Instruction Set Architecture (3)

RISC Characteristics
One instruction per cycle
Register to register operations
Few, simple addressing modes
Few, simple instruction formats
Hardwired design (no microcode)
Fixed instruction format, fixed length, aligned
on word boundary ? instruction fetch optimized
More compile time/effort
List on Page 480

40
Reduced Instruction Set Architecture (4)