PROCESSOR ARCHITECTURE

About This Presentation

Title:

PROCESSOR ARCHITECTURE

Description:

PROCESSOR ARCHITECTURE Jehan-Fran ois P ris jparis_at_uh.edu – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 141

Provided by: Jehan73

Learn more at: https://www2.cs.uh.edu

Category:

more less

Transcript and Presenter's Notes

Title: PROCESSOR ARCHITECTURE

1
PROCESSOR ARCHITECTURE

Jehan-François Pâris
jparis_at_uh.edu

2
Chapter Organization

Logic design conventions
Implementation of a "toy" CPU
Pipelining
Pipelining hazards
Data hazards
Control hazards
Exceptions
Parallelism

IMPORTANT
3
LOGIC DESIGN CONVENTIONS
4
Combinational/state elements

Combinational elements
Outputs only depend on current inputs
Stateless
Adders and, more generally, arithmetic logic unit
(ALU)

5
Combinational/state elements

State elements
Have a memory holding a state
Output depends on current inputs and state of
element
State reflects past inputs
Flip-flops,

6
Judicial analogy

In our legal system
Guilty/not guilty decision is stateless
Good reasons
Sentencing decision is not
"Three strikes and you are out" laws
Good reasons

7
Clocking methodology

We will assume an edge-triggered clocking
technology
Edge is short-enough to prevent data propagation
in state elements
Can read current state of a memory element at the
same time we update it

8
Clocking convention

Omit write control signal if state element is
updated at every active clock edge

9
A "TOY" CPU
10
Motivation

"Toy" CPU will implement a subset of MIPS
instruction set
Subset will be
Self-sufficient
Simpler to implement
Complex enough to allow a serious discussion of
CPU architecture

11
The subset

Will include
Load and store instructionslw (load word) and
sw (store word)
Arithmetic-logic instructionsadd, sub, and, or
and slt (set less than)
Branch instructionsbeq (branch if equal) and j
(jump)

12
Load and store instructions

Format I
Three operands
Two registers r1 and r2
One displacement d
lw r1, d(r2) loads into register r1 main
memory word at address contents(r2) d
sw r1, d(r2) stores contents of register r1
into main memory word at address contents(r2) d

13
Arithmetic-logic instructions

Format R
Three operands
Three registers r1, r2 and r3
Store into register r1 result of r2 ltopgt
r3where ltopgt can be add, subtract, and, oras
well as set if less than

14
Branch instruction

Format I
Three operands
Two registers r1 and r2
One displacement d
beq r1, r2, dset value of PC to PC4
4diff r1 r2

15
The simplest data path

Assume CPU will do nothing but
Incrementing its program counter and
Deliver the next instruction

16
The simplest data path
Add
4
InstructionMemory Read address Instruction
PC
17
Implementing R2R instructions

Takes two 32-bit inputs
Returns
A 32-bit output
A 1-bit signal if the result is zero

18
The register file

Two read outputs that are always available
One write input activated by a RegWrite signal
Three register selectors

19
The register file
Read select 1 Read data 1 Read select 2 Read
data 2 Write select Write data
5
RegWriteenables register writes
20
Implementing R2R instructions
Registerfile
ALU
Result
RegWrite is enabled
21
Implementing load and store

Require
An address calculation
contents(r2) d
An access to data memory
Before doing the address calculation, we must
transform 16-bit displacement d into a 32-bit
value using sign extension

22
The data memory

One address selector
One write data input
One read data output
Two controls
MemWrite
MemRead

23
Sign extension (I)

If 16-bit number has a zero as MSB
It is positive
Must add 16 zero bits

0110 1010 1010 0100
24
Sign extension (II)

If 16-bit number has a one as MSB
It is negative
Must add 16 one bits

1110 1010 1010 0100
25
The data memory
MemWrite enables memory writes
Memory address Read data Write data
MemRead enables memory reads
26
Implementing the store instruction
Registerfile
ALU
Address Read Write
SE
Sign-extended d field
27
Implementing the load instruction
Registerfile
ALU
Address Read Write
SEd field
SE
28
Implementing conditional branch

Target Address
Sign-extend 16-bit immediate part of instruction
Shift left 2
Add to PC
Branch Control Logic
Perform test operation on two registers
Check result

29
Implementing conditional branch
Branch Destination
PC4
Add
Shiftleft 2
Registerfile
To branch control logic
ALU
d field of instruction
Sign-extended d field
30
Note

Arithmetic-logic operations only use
Register file and ALU
Load and store use
ALU for computing memory address
Data memory

31
Implementing other instructions
32
Combining everything
33
Left to be done

All control signals
Two multiplexers ALUSrc and MemtoReg
RegWrite, MemRad and MemWrite switches
ALU controls (4 bits)

34
ALU control signals
ALU control lines Function
0000 and
0001 or
0010 add
0110 subtract
0111 set on less than
1100 nor (not in "toy" subset)
35
Controlling the ALU

Recall that all R-format instructions have same
opcode
Operation performed by ALU is specified in the
function field (bits lt05gt)

36
Controlling the ALU

ALU control inputs generated by two-step process
Construct two ALUOp control bits fromopcode
Construct four ALU control bits using
Two ALUop bits
Six bits from function field when they are needed

37
Dependence table
Opcode ALUOp Operation Function Action ALU Ctl
lw 00 lw - add 0010
sw 00 sw - add 0010
beq 01 beq - subtract 0110
R-type 10 add 100000 add 0010
R-type 10 subtract 100010 subtract 0110
R-type 10 and 100100 and 0000
R-type 10 or 100101 or 0001
R-type 10 slt 101010 slt 0111
38
Notes

Two step process simplifies combinatorial logic
Many don't care conditions in truth table

39
Truth table
ALUOp1 ALUOp2 F5 F4 F3 F2 F1 F0 ALU Control bits
0 0 X X X X X X 0010
0 1 X X X X X X 0110
1 0 X X 0 0 0 0 0010
1 X X X 0 0 1 0 0110
1 0 X X 0 1 0 0 0000
1 0 X X 0 1 0 1 0001
1 X X X 1 0 1 0 0111
40
Note

Bits 4 and 5 of function field are not used
ALUOp bits only have three possible values00,
01 and 10
Introduces don't care conditions
All R instructions use same data paths
Other control bits depend only on opcode

41
Control signal effects
Signal When deasserted When asserted
Regdest Destination register comes from rt field (bits 2016) Destination register comes from rd field (bits 1510)
Regwrite None Enables write into destination register
ALUSrc Second ALU operand comes from second register output Second ALU operand comes from sign-extended displacement (bits 150)
42
Control signal effects
Signal When deasserted When asserted
PCSrc PC is incremented by 4 PC set to branch target value
MemRead None Enables memory read output
MemWrite None Enables memory write
MemtoReg Value fed to destination register comes from ALU Value fed to destination register comes from memory
43
Note

PCSrc is asserted when
Instruction is a branch
and
ALU Zero result bit is asserted
We will introduce a Branch control line

44
Control line settings
Instruction Rdest ALUsrc MemtoReg RegWrite
R-format 1 0 0 1
lw 0 1 1 1
sw X 1 X 0
beq X 0 X 0
45
Control line settings
Instruction MemRead Mem Write Branch ALUOp 1 ALUOp 0
R-format 0 0 0 1 0
lw 1 0 0 0 0
sw 0 1 0 0 0
beq 0 0 1 0 1
46
Active datapaths for a R instruction
47
Active datapaths for a load instruction
48
Active datapaths for a beq instruction
49
The weird" jump instruction

Uses J format
Single 26 bit operand
Implements an unconditional jump
New value of PC is obtained as follows
Bits 10 are zero (address is multiple of 4)
Bits 282 come from jump operand
Bits 3129 come from PC4

50
Implementing the jump instruction
51
Limitations of single-cycle design

If we want all instructions to be executed in one
cycle
Clock cycle must be long enough to accommodate
instruction taking the most time
Floating-point multiply or divide
Does not work for CPUs that have a rich
instruction set

52
PIPELINING
53
An analogy (I)

Washing your clothes
Four steps
Putting in the washer
Putting in the dryer
Folding/ironing
Putting them away

54
An analogy (II)

Most people
Start second wash load as soon as first wash
load is in dryer
Put second wash load in dryer and start a third
wash load while they are folding/ironing the firs
washload

55
Purely sequential approach
Time 6 pm 630 7pm 730 8pm 830 9pm 930
Wash Dry Fold Store
Wash Dry Fold Store
56
Smart approach
Time 6 pm 630 7pm 730 8pm 830 9pm 930
Wash Dry Fold Store
Wash Dry Fold Store
Wash Dry Fold Store
Wash Dry Fold Store
Solution assumes that a housemateputs
folded/ironed clothes away for us
57
Main advantage

Can do much more in much less time

58
Limitation

Slowed down by time taken by longest step
Could be washing/drying/ironing

59
Instruction steps (I)

Good candidates for pipelining steps
Fetch instruction from memory
Decode instruction
Read registers
Execute register to register operation or
calculate address
Access operand in memory
Write results into a register

60
Instruction steps (II)

Since MIPS instruction set has fixed fields, we
can combine steps 2 and 3
Fetch instruction from memory
Read registers while decoding instruction
Execute register to register operation or
calculate address
Access operand in memory
Write results into a register

61
Sample step timings
Instructionclass Instructionfetch Registerread ALUoperation Data access Registerwrite Totaltime
Load word (lw) 200 ps 100ps 200ps 200ps 100ps 800ps
Store word (sw) 200 ps 100ps 200ps 200ps --- 700ps
R format instruction 200 ps 100ps 200ps -- 100ps 600ps
Branch(beq) 200 ps 100ps 200ps -- -- 500 ps
62
Step 1 Fetch and decode
63
Step 2 Read registers
64
Step 3 Use the ALU
65
Step 4 Access operand in memory
66
Step 5 Store result in register
67
Observations

Most R format instructions operate on three
registers and skip step 4
Same for most I format instructions with an
immediate operand
Store operations skip step 5
Load register instructions go through all five
steps

68
Pipelining limitations

Some instructions that skip a step will still
have to wait until preceding instruction is done.
Hazards
An instruction cannot proceed because
Hardware cannot support the combination of
instructions (structural hazards)
Data are not ready (data hazards)
Control/branch hazards

69
Structural hazards

Combinations of instructions that prevent
pipelining

70
A bad MIPS instruction (I)

Recall that IBM instructions set had instructions
allowing to add to a register the contents of a
memory location
RX format

71
A bad MIPS instruction (II)

We could think of a MIPS instruction with three
registers operands
ADDX r1, r2, r3
adding to r1 the contents of the word at address
contents of r2 contents of r3
We would have r1 r1 Memr2r3

72
A bad MIPS instruction (III)

It would be great for accessing arrays
r2 will have starting address of array
r3 would contain the array index multiplied by 4

(incremented after each step)
r3
r2
(fixed value)
73
A bad MIPS instruction (IV)

Adding this instruction would be a very bad idea
Why?

74
Answer

Instruction would require two steps using the ALU
Adding r2 and r3 to compute the address of the
memory operand (step 4)
Adding the memory operand to r1
New step would introduce a structural hazard by
preventing any other instruction to access the
ALU

75
My comment

Careful design of the MIPS CPU and instruction
set should be noted
Not true for older instructions sets
IBM 360, DEC VAX,
Not true for X86 instruction sets
CPU is designed to be compatible with an existing
instruction set

76
Designing instruction sets for pipelining (I)

All instructions should have the same length
Can fetch future instructions before the current
one is decoded
Have few instruction formats with register fields
always in the same position
Can combine instruction decode and register read
steps

77
Designing instruction sets for pipelining (II)

Memory operands should only appear in load and
store instruction
No instruction can use the ALU twice!
Operands must be properly aligned in memory
Can always access them in a single memory cycle

78
Data hazards (I)

Assume we have
add s0, t0, t1sub t2, s0, t3
or
s0 t0 t1t2 s0 t3
Need result of add before proceeding with sub
instruction

79
Detail of steps
Cycle 1 2 3 4 5 6
add IF ID/RR ALU RW
sub IF stall stall ID/RR ALU

Second instruction must wait until first
instruction updated s0 in cycle 4 before reading
its value in cycle 5

80
Data hazards (II)

New value of s0 computed by the add instruction
is not stored in s0 until its step 5 has
completed
New instruction must wait until add instruction
has performed its step 5 before performing its
step

81
Data hazards (III)
add
sub
82
Data hazards (IV)

We lose two cycles during which nothing can be
done
Cannot trust compiler to remove all data hazards
Observe that new value of s0 become available
at the end of step 3 of add instruction
Add special circuitry to provide this value at
the end of step 2 of sub instruction
Forwarding or bypassing

83
After forwarding
84
Detail of steps
Cycle 1 2 3 4 5 6
add IF ID/RR ALU RW
sub IF ID/RR ALU RW

Second instruction now gets updated value at the
end of cycle 3 just in time to use it in cycle 4
No stall cycles

85
Limitations (I)

Forwarding worked very well because output of
step 4 of add was forwarded to be input of step 3
of sub
Would not work as well if output of an
instruction step is need as input of instruction
step of next instruction
Will still have one or more pipeline stalls
(bubbles)

86
Limitations (II)

Assume we have
lw s0, 20(t1)sub t2, s0, t3
or
s0 Memt120t2 s0 t3
Need new value of s0 before proceeding with sub
instruction

87
Limitations (III)
88
Detail of steps
Cycle 1 2 3 4 5 6
lw IF ID/RR ALU MEM RW
sub IF ID/RR stall ALU RW

Even with forwarding second instruction must wait
until completion of memory access of first
instruction in cycle 4 before performing its ALU
step in cycle 5
One stall cycle

89
A last word

In many architectures, the floating point unit is
a significant source of structural hazards
Less well adapted to pipelining
The MIPS architecture assumes that we have
separate memories for instructions and data
Having a single memory for both would result in
many more hazards

90
Control / jump hazards

Happen whenever we have a conditional jump
Consider the instructions
add 4, 5,6beq 1,2, 40or 7, 8, 9
Need result of conditional branch (beq) before
deciding whether to execute next instruction (or)

91
Control hazards (II)
92
Pipelined datapath
93
Datapaths for pipelined organization

Define five steps
Fetch instruction from memory (IF)
Instruction decode and register reads (ID)
Execute AL operation on ALU (EX)
Access operand in memory (MEM)
Write back results into a register (WB)

94
Datapaths for pipelined organization

Insert registers to save outputs of each step
before they get updated by th next step
IF/ID registers
ID/EX registers
EX/MEM registers
MEM/WB registers

95
A first try
New
New
New
New
IF/
96
Comments

This first try is not correct
Load instruction will not be implemented
correctly
Address of destination register will be lost as
soon as new instruction will be fetched
Must save it at each step

97
The almost correct datapaths
Register address follows instruction
98
The almost correct datapaths
99
More problems

Address of destination register is not always at
the same place in all instructions
Could be instruction bits (20-16)
For all I-format instructions that write into a
register
Could be instruction bits (15-11)
In R format instructions

100
Why?

In R format instructions
In I format instructions

constant/address
101
The solution

Add a multiplexer at stage EX

102
More about data hazards

Consider
sub 2,1,3and 12, 2, 5or 13, 6, 2add
14, 2, 2sw 15, 100(2)
Last four instructions depend on result of sub

103
More about data hazards

2 is updated at the end of last cycle of sub
First instruction that would get the correct
value of 2 would be the add

104
More about data hazards
sub IF IDReg EX MEM WB
and IF IDReg EX MEM WB
or IF IDReg EX MEM
add IF IDReg EX
sw IF IDReg
105
Adding a forwarding unit
106
More data hazards

We can forward the results of sub instruction at
the end of its EX step
In time for all four following instructions
To do that we need special forwarding unit
Not all data hazards can be avoided
lw followed by any instruction accessing the
loaded word

107
Why?

lw loads word from RAM into memory
Goes through IF, IDReg, EX, MEM and WB steps
Register value is updated at the end of WB step
Must delay any following instruction that wants
to access the contents of the register

108
Data hazard detection unit

Detects hazards that cannot be avoided
Inserts no operation instructions (nop)
They do nothing!

109
More about control hazards

Outcome of conditional branch is not known until
end of step EX
beq and bne use arithmetic unit to evaluate the
branch condition
If branch is taken, we must abort the two
following instructions
Easy because they have not yet updated anything

110
More about control hazards
beq IF IDReg EX MEM WB
next IF IDReg ABORT
next IF ABORT
dest IF IDReg EX
111
More about control hazards
beq IF IDReg EX MEM WB
next IF ABORT

dest IF IDReg EX MEM
112
Better implementation of beq/bne
113
MIPS Optimization

Move comparison ahead to reduce the number of
aborted instructions
Add a simple EQUAL/NOT EQUALcomparison hardware
that tests outputs of register file
Bitwise XOR then ORing the results
Will return zero if the register contents are
identical

114
Explanations

Moving the jump address calculation one step
ahead means that we will always do the
calculation even when it is not needed.
Simple comparator duplicates one ALU function

115
New problem

We need now the correct values of the input
registers in step ID
More data hazards
add t0, t2, t3beq t0, s0, 400
Data forwarding can reduce the number of nops but
not eliminate them

116
New data hazards
add IF IDReg EX MEM WB
nop
nop
beq IF IDReg EX MEM
117
EXCEPTIONS AND INTERRUPTS
118
Interrupts (I)

Request to interrupt the flow of execution the
CPU
Detected by the CPU hardware
After it has executed the current instruction
Before it starts the next instruction.

119
Interrupts (II)

When an interrupt occurs
The current state of the CPU (program counter,
program status word, contents of registers, and
so forth) is saved, normally on the top of a
stack
A new CPU state is fetched

120
Interrupts (III)

New state includes a new hardware-defined value
for the program counter
Cannot hijack interrupts
Process is totally transparent to the task being
interrupted
A process never knows whether it has been
interrupted or not

121
Types of interrupts (I)

I/O completion interrupts
Notify the OS that an I/O operation has
completed,
Timer interrupts
Notify the OS that a task has exceeded its
quantum of CPU time,

122
Types of interrupts (II)

Traps
Notify the OS of a program error (division by
zero, illegal op code, illegal operand address,
...) or a hardware failure
System calls
Notify OS that the running task wants to submit a
request to the OS
Notification of another event

123
A surprising discovery

Programs do interrupt themselves!

124
Context switches

Each interrupt will result intotwo context
switches
One when the running task is interrupted
Another when it regains the CPU
Context switches are not cheap
The overhead of any simple system call istwo
context switches

Remember that for 4330!
125
Prioritizing interrupts (I)

Interrupt requests may occur while the system is
processing another interrupt
All interrupts are not equally urgent (as it is
also in real life)
Some are more urgent than other
Also true in real life

126
Prioritizing interrupts (II)

The best solution is to prioritize interrupts
and assign to each source of interrupts a
priority level
New interrupt requests will be allowed to
interrupt lower-priority interrupts but will have
to wait for the completion of all other
interrupts
Solution is known as vectorized interrupts.

127
Example from real life

Let us try to prioritize
Phone is ringing
Washer signals end of cycle
Dark smoke is coming out of the kitchen
With vectorized interrupts, a phone call will
never interrupt another phone call

128
The solution
129
MIPS Implementation (I)

Interrupts are a special case of a branch
Use same techniques for handling control hazards
Almost all MIPS interrupts jump to the same
hardware address (x80000180)
MIPS use a special register to pass along the
type of interrupt to the interrupt handler
The Cause register

130
MIPS Implementation (II)

MIPS also saves the address 4 of the affected
instruction in a special register
EPC register
A STATUS register allows selective disabling of
interrupts
Useful for handling short critical sections in
single-threaded kernel

131
Issues (I)

Interrupted instruction may have to be restarted
Typical for I/O completion interrupts
Must then maintain precise exceptions that
accurately identify the instruction being
interrupted
Not true for hardware interrupts

132
Issues (II)

Must be able to restart instruction at the exact
point it was interrupted
Not always easy on many architectures
MIPS solution is to roll back everything and
restart instruction as if nothing had happened
Easier on MIPS since register/memory update is
always the last step of any instruction
Must still ensure that we can restore the
original values of all registers

133
Branch prediction

CPU will try to predict whether a branch will be
taken or not
Important for loops
Branch is taken at every iteration but last one

See speculative execution
134
Parallelism

Instruction-level parallelism (ILP)
Two ways
Increasing the depth of the pipeline
More steps can be executed in parallel
Multiple issue
We duplicate some units (ALU)
Two or more units can be at the same pipeline
stage

135
An example

Could modify the toy MIPS architecture by adding
a second ALU
Would allow RR instructions be executed in
parallel with load and store instructions
Would also need extra ports in the register bank
Faster but much more complex

136
Hazards