Title: EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor
1EEL-4713Computer ArchitectureDesigning a
Multiple-Cycle Processor
2Outline of todays lecture
- Recap and Introduction
- Introduction to the Concept of Multiple Cycle
Processor - Multiple Cycle Implementation of R-type
Instructions - What is a Multiple Cycle Delay Path and Why is it
Bad? - Multiple Cycle Implementation of Or Immediate
- Multiple Cycle Implementation of Load and Store
- Putting it all Together
3Abstract view of our single cycle processor
Main Control
op
ALU control
fun
ALUSrc
Equal
ExtOp
MemWr
MemWr
MemRd
RegDst
RegWr
nPC_sel
ALUctr
Reg. Wrt
ALU
Register Fetch
Ext
Mem Access
PC
Instruction Fetch
Next PC
Result Store
Data Mem
- looks like an FSM with PC as state
4Whats wrong with our CPI1 processor?
Arithmetic Logical
Reg File
PC
Reg File
Inst Memory
ALU
mux
mux
Load
Reg File
PC
Inst Memory
ALU
Data Mem
Reg File
mux
mux
Critical Path
Store
PC
Inst Memory
ALU
Data Mem
Reg File
mux
Branch
PC
Inst Memory
cmp
Reg File
mux
- Long Cycle Time
- All instructions take as much time as the slowest
- Real memory is not so nice as our idealized
memory - cannot always get the job done in one (short)
cycle
5Drawbacks of this single cycle processor
- Long cycle time
- Cycle time is much longer than needed for all
other instructions. Examples - R-type instructions do not require data memory
access - Jump does not require ALU operation nor data
memory access - Need for multiple functional units
- Cant share functional units for multiple
operations in the same instruction - E.g., instruction/data memory, adders (PC, ALU,
branch target, etc.)
6Overview of a multiple cycle implementation
- The root of the single cycle processors
problems - The cycle time has to be long enough for the
slowest instruction - Solution
- Break the instruction into smaller steps
- Execute each step (instead of the entire
instruction) in one cycle - Cycle time time it takes to execute the longest
step - All the steps have similar length
- This is the essence of the multiple cycle
processor - The advantages of the multiple cycle processor
- Cycle time is much shorter
- Different instructions take different number of
cycles to complete (for now) - Load takes five cycles
- Jump only takes three cycles
- Allows a functional unit to be used more than
once per instruction
7What and when
- When designing multi-cycle implementations, you
must think about - What to do on each cycle
- When results are ready for next cycle
- What to do on each cycle
- Always need to fetch instruction
- Always need to decode instruction (know what to
do next) - Next need to perform actual operation (varies
from instruction to instruction) - E.g.
- Load will require address calculation, memory
read, reg write - Branch will require comparison and PC update
- R-type will require ALU operation, reg write
8Overview Control State Diagram
Ifetch
Rfetch/Decode
BrComplete
Fetch and
store in IR PCPC4
Decode
Finish branch
beq
operation
Calculate the
AdrCal
Target address
Index registers
ALU Computes rsimm
Select mux, Write PC
Register operands
In busA, busB
lw or sw
Ori
Rtype
OriExec
lw
sw
RExec
SWMem
ALU has inputs and control set, can compute
LWmem
ALU has inputs and control set, can compute
Read mem. with address calculated above
Write rt to memory address calc. above
OriFinish
Rfinish
Latch ALU computed result in rd
Latch ALU computed result in rt
LWwr
Latch data read from memory in rt
9Example the five steps of a Load instruction
Instruction Fetch
Instr Decode /
Address
Reg Wr
Data Memory
Reg. Fetch
Clk
Clk-to-Q
New Value
Old Value
PC
Instruction Memory Access Time
Rs, Rt, Rd, Op, Func
Old Value
New Value
Delay through Control Logic
ALUctr
Old Value
New Value
ExtOp
Old Value
New Value
ALUSrc
Old Value
New Value
RegWr
Old Value
New Value
Register File Access Time
busA
Old Value
New Value
Delay through Extender Mux
Register File Write Time
busB
Old Value
New Value
ALU Delay
Address
Old Value
New Value
Data Memory Access Time
busW
Old Value
New
10Multicycle datapath
- Similar to the single-cycle datapath use latches
for instruction and branch target address - Control signals generated for multiple clock
cycles per instruction - FSM
PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd
ExtOp1
11Overview Control State Diagram
Ifetch
Rfetch/Decode
BrComplete
Fetch and
store in IR PCPC4
Decode
Finish branch
beq
operation
Calculate the
AdrCal
Target address
Index registers
ALU Computes rsimm
Select mux, Write PC
Register operands
In busA, busB
lw or sw
Ori
Rtype
OriExec
lw
sw
RExec
SWMem
ALU has inputs and control set, can compute
LWmem
ALU has inputs and control set, can compute
Read mem. with address calculated above
Write rt to memory address calc. above
OriFinish
Rfinish
Latch ALU computed result in rd
Latch ALU computed result in rt
LWwr
Latch data read from memory in rt
12Cycle 1 Fetch
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
131 Instruction fetch cycle beginning
- Every cycle begins right AFTER the clock tick
- memPC PClt310gt 4
Clk
One Logic Clock Cycle
You are here!
Use the main ALU
PCWr?
PC
32
MemWr?
IRWr?
32
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
WrAdr
32
Dout
Din
32
ALUop?
32
Clk
141 Instruction fetch cycle end
- Every cycle ends AT the next clock tick (storage
element updates) - IR lt-- memPC PClt310gt lt-- PClt310gt 4
Clk
One Logic Clock Cycle
You are here!
PCWr1
PC
32
MemWr0
IRWr1
32
00
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
32
WrAdr
Dout
Din
ALUOp Add
32
32
Clk
151 Instruction Fetch Cycle Overall Picture
- Latch IRmemPC
- Set PCPC4
Fetch and
store in IR PCPC4
PCWr1
PCWrCondx
PCSrc0
BrWr0
Zero
ALUSelA0
MemWr0
IRWr1
IorD0
1
Mux
32
PC
0
32
Zero
RAdr
32
32
busA
Ideal Memory
32
Instruction Reg
32
4
0
32
WrAdr
32
1
32
32
Din
Dout
32
busB
2
32
3
ALUSelB00
ALUOpAdd
16Cycle 2 Register fetch, decode
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
172 Register Fetch / Instruction Decode
- busA lt- RegFilers busB lt- RegFilert
- ALU can be also be used to compute branch target
address (next slide) in this cycle (latch
target) register compare done on next cycle
PCWr0
PCWrCond0
PCSrcx
Zero
ALUSelAx
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Op
Go to the Control
Imm
6
ALUSelBxx
Func
16
6
ALUOpxx
182 Register Fetch / Instruction Decode (Continue)
- busA lt- Regrs busB lt- Regrt
- Target lt- PC SignExt(Imm16)4
- Generate control signals
Decode
operation
Index registers
Register operands
In busA, busB
PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd
ExtOp1
19Cycle 3 Branch completion
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
203 Branch Completion
Finish branch
- if (busA busB)
- PC lt- Target
Calculate the
Target address
Select mux, Write PC
PCWr0
PCWrCond1
PCSrc1
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
32
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
ALUSelB01
16
32
ALUOpSub
ExtOpx
21Cycle 3 Rtype execution
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
223 R-type Execution
ALU has inputs and control set, can compute
- ALU Output lt- busA op busB
Jump
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst1
IorDx
1
JumpAddr
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Funct
Imm
32
ALUOpRtype
MemtoRegx
ExtOpx
ALUSelB01
23Cycle 4 Rtype completion
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
244 R-type Completion
Latch ALU computed result in rd
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst1
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpRtype
MemtoReg0
ExtOpx
ALUSelB01
25Cycle 3 Ori execution
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
263 Ori Execution
- ALU output lt- busA or ZeroExtImm16
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpOr
MemtoRegx
ExtOp0
ALUSelB11
27Cycle 4 Ori completion
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
284 Ori Completion
Latch ALU computed result in rt
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpOr
MemtoReg0
ExtOp0
ALUSelB11
29Cycle 3 Address calculation
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
303 Memory Address Calculation
AdrCal
ALU Computes rsimm
1 ExtOp
ALUSelA
ALUSelB11
- ALU output lt- busA SignExtImm16
ALUOpAdd
x MemtoReg
PCSrc
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
31Cycle 4 Memory access, store
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
324 Memory Access for Store
Write rt to memory address calc. above
SWmem
1 ExtOp
MemWr
ALUSelA
ALUSelB11
ALUOpAdd
x PCSrc,RegDst
PCWr0
PCWrCond0
PCSrcx
BrWr0
MemtoReg
Zero
ALUSelA1
MemWr1
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
33Cycle 4 Memory access, load
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
344 Memory Access for Load
Read mem. with address calculated above
- Mem Dout lt- memALU output
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorD1
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
35Cycle 4 Memory access, load
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
365 Write Back for Load
Latch data read from memory in rt
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoReg1
ExtOp1
ALUSelB11
37Putting it all together Multiple Cycle Datapath
PCWr
PCWrCond
PCSrc
BrWr
Zero
ALUSelA
MemWr
IRWr
RegWr
RegDst
IorD
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOp
MemtoReg
ExtOp
ALUSelB
38Putting it all together Control State Diagram
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
39Note there is a multiple-cycle delay path
- There is no register to save the results between
- 2) Register Fetch busA lt- Regrs busB lt-
Regrt - 3) R-type Execution ALU output lt- busA op busB
- 4) R-type Completion Regrd lt- ALU output
Register here to save outputs of Rfetch?
ALUselA
IRWr
Register here to save outputs of RExec?
Zero
Rs
Ra
5
busA
32
Rt
Rb
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
1
32
Rd
busW
32
busB
2
3
ALUselB
ALUOp
40A Multiple Cycle Delay Path (Continue)
- Register is NOT needed to save the outputs of
Register Fetch - IRWr 0 busA and busB will not change after
Register Fetch - Register is NOT needed to save the outputs of
R-type Execution - busA and busB will not change after Register
Fetch - Control signals ALUSelA, ALUSelB, and ALUOpwill
not change after R-type Execution - Consequently ALU output will not change after
R-type Execution - In theory, you need a register to hold a signal
value if - (1) The signal is computed in one clock cycle and
used in another. - (2) AND the inputs to the functional block that
computes this signal can change before the
signal is written into a state element. - You can save a register if Cond 1 is true BUT
Cond 2 is false - But in practice, this will introduce a multiple
cycle delay path - A logic delay path that takes multiple cycles to
propagate from one storage element to the next
storage element
41Pros and Cons of a Multiple Cycle Delay Path
- A 3-cycle path example
- IR (storage) -gt Reg File Read -gt ALU -gt Reg
File Write (storage) - Advantages
- Register savings
- We can share time among cycles
- If ALU takes longer than one cycle, still OK as
longas the entire path takes less than 3 cycles
to finish
42Pros and Cons of a Multiple Cycle Delay Path
(Continue)
- Disadvantage
- Static timing analyzer, which ONLY looks at delay
between two storage elements, will report this as
a timing violation - You have to ignore the static timing analyzers
warnings
43Summary
- Disadvantages of the Single Cycle Processor
- Long cycle time
- Cycle time is too long for all instructions
except the Load - Multiple Cycle Processor
- Divide the instructions into smaller steps
- Execute each step (instead of the entire
instruction) in one cycle - Do NOT confuse Multiple Cycle Processor with
multiple cycle delay path - Multiple Cycle Processor executes
eachinstruction in multiple clock cycles - Multiple Cycle Delay Path a combinational logic
path between two storage elements that takes more
than one clock cycle to complete - It is possible (desirable) to build a MC
Processor without MCDP - Use a register to save a signals value whenever
a signal is generated in one clock cycle and used
in another cycle later
44Control logic
- Review of Finite State Machine (FSM) control
- From Finite State Diagrams to Microprogramming
45Overview
- Control may be designed using one of several
initial representations. The choice of sequence
control, and how logic is represented, can then
be determined independently the control can then
be implemented with one of several methods using
a structured logic technique. - Initial Representation Finite State Diagram
Microprogram - Sequencing Control Explicit Next State
Microprogram counter Function Dispatch ROMs
- Logic Representation Logic Equations Truth Tables
- Implementation Technique PLA ROM
hardwired control
microprogrammed control
46Initial Representation Finite State Diagram
0
1
8
beq
2
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
10
Rtype
OriExec
lw
sw
6
3
5
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
11
MemtoReg
OriFinish
7
4
LWwr
9 Jump See Fig C.3.1
47Sequencing Control Explicit Next State Function
O u t p u t s
Control Logic and Next State Logic
Multicycle Datapath
Inputs
State Reg
- Next state number is encoded just like datapath
controls
48Interface in detail
49Logic Representation Logic Equations
- Alternatively, prior state condition
- S4, S5, S7, S8, S9, S11 -gt State0
- State 0_____________ -gt State 1
- State 1 op lwsw -gt State 2
- State2 op lw ____ -gt State 3
- State 3 ____________ -gt State 4
- State2 op sw ____ -gt State 5
- State 1 op R-type -gt State 6
- State 6 _____________-gt State 7
- State 1 op beq___ -gt State 8
- State2 op jmp ___-gt State 9
- State 1 op ORi__ -gt State 10
- State 10 __________ -gt State 11
- Next state from current state
- State 0 -gt State1
- State 1 -gt S2, S6, S8, S10
- State 2 -gt S3, S5
- State 3 -gt State 4
- State 4 -gtState 0
- State 5 -gt State 0
- State 6 -gt State 7
- State 7 -gt State 0
- State 8 -gt State 0
- State 9-gt State 0
- State 10 -gt State 11
- State 11 -gt State 0
See Fig. C.3.3
50Implementation Technique Programmed Logic Arrays
- Each output line the logical OR of logical AND
of input lines or their complement AND minterms
specified in top AND plane, OR sums specified in
bottom OR plane
Op5
R 000000 beq 000100 lw 100011 sw
101011 ori 001011 jmp 000010
Op4
Op3
Op2
Op1
Op0
Opany
S0
6 0110 7 0111 8 1000 9 1001 10 1010 11
1011
0 0000 1 0001 2 0010 3 0011 4 0100 5
0101
Next1
51Implementation Technique Programmed Logic Arrays
- Each output line the logical OR of logical AND of
input lines or their complement AND minterms
specified in top AND plane, OR sums specified in
bottom OR plane
lw 100011 sw 101011 R 000000 ori
001011 beq 000100 jmp 000010
Op5
Op4
Op3
Op2
Op1
Op0
0 0000 1 0001 2 0010 3 0011 4 0100 5
0101
6 0110 7 0111 8 1000 9 1001 10 1010 11
1011
52Multicycle Control
- Given numbers assigned to FSM, can in turn
determine next state as function of the inputs
and current state - Can turn these into Boolean equations for each
bit of the next state lines - Implement easily using PLA (programmable logic
array) or ROM storing truth tables - See Figs. C.3.6 and C.3.8 for tables showing
outputs and next state as function of current
state and opcode - What if many more states, many more conditions?
- State machine gets too large very large
ROMs/PLAs - What if need to add a state?
- May need to increase address for ROM, number of
inputs for PLA gates - Or just implement FSM in VHDL
53Next Iteration Using Sequencer for Next State
- Before Explicit Next State Next try variation 1
step from right hand side - Few sequential states in small FSM suppose added
floating point? - Still need to go to non-sequential states e.g.,
state 1 gt 2, 6, 8, 10 - Initial Representation Finite State Diagram
Microprogram - Sequencing Control Explicit Next State
Microprogram counter Function Dispatch ROMs
- Logic Representation Logic Equations Truth Tables
- Implementation Technique PLA ROM
hardwired control
microprogrammed control
54Sequencer-based control unit
Control Logic
Multicycle Datapath
Outputs
Inputs
A processor on its own Set state to 0
Dispatch (state 1 2) Incremented state number
(microPC) and micro branches specify whether
to increment or select another state based
on opcode
1
Adder
Address Select Logic
55Sequencer-based control unit
Control Logic
Multicycle Datapath
Outputs
Inputs
AddrCtrl control for select logic 0 reset state
register (next instruction) 1 select next using
next-state ROM for state 1 2 select next using
next-state ROM for state 2 3 increment state
1
Adder
Address Select Logic
Fig. C.4.2
56Sequencer block diagram
Before 6bit opcode 4-bit state -gt 4-bit
NS Now 4-bit state -gt 2-bit AddrCtrl
0000 AddrCtrl3 (fetch) 0001 AddrCtrl1
(decode) 0010 AddrCtrl2 (lw/sw) 0011
AddrCtrl3 (lw) 0100 AddrCtrl0 (lw) 0101
AddrCtrl0 (sw) 0110 AddrCtrl3 (r-type) 0111
AddrCtrl0 (r-type) 1000 AddrCtrl0
(branch) 1010 AddrCtrl3 (ori) 1011
AddrCtrl0 (ori) Dispatch ROM 1 (Indexed by
opcode) lw -gt 0010 sw -gt 0010 R-type -gt 0110 ori
-gt 1010 branch -gt 1000 ROM2 lw -gt 0011 sw -gt
0101
57Initial Representation Finite State Diagram
0
1
8
beq
2
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
10
Rtype
OriExec
lw
sw
6
3
5
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
11
MemtoReg
OriFinish
7
4
LWwr
9 Jump See Fig C.3.1
58Next Iteration Using Microprogram for
Representation
- Initial Representation Finite State Diagram
Microprogram - Sequencing Control Explicit Next State
Microprogram counter Function Dispatch ROMs
- Logic Representation Logic Equations Truth Tables
- Implementation Technique PLA ROM
- ROM can be thought of as a sequence of control
words - Control word can be thought of as instruction
microinstruction
hardwired control
microprogrammed control
59Microprogramming
- Control is the hard part of processor design
- Datapath is fairly regular and well-organized
- Memory is highly regular
- Control is irregular and global
Microprogramming -- A particular strategy for
implementing the control unit of a processor
by "programming" at the level of register
transfer operations Microarchitecture --
Logical structure and functional capabilities of
the hardware as seen by the microprogrammer
60Macroinstruction Interpretation
User program plus Data this can change!
Main Memory
ADD SUB AND
. . .
one of these is mapped into one of these
DATA
execution unit
AND microsequence e.g., Fetch Calc
Operand Addr Fetch Operand(s)
Calculate Save Answer(s)
control memory
CPU
61Microprogramming Pros and Cons
- Ease of design
- Flexibility
- Easy to adapt to changes in organization, timing,
technology - Can make changes late in design cycle, or even in
the field - Can implement very powerful instruction sets
(just more control memory) - Generality
- Can implement multiple instruction sets on same
machine. - Can tailor instruction set to application.
- Compatibility
- Many organizations, same instruction set
- Costly to implement
- Slow
62Summary Multicycle Control
- Microprogramming and hardwired control have many
similarities, perhaps biggest difference is
initial representation and ease of change of
implementation, with ROM generally being easier
than PLA - Initial Representation Finite State Diagram
Microprogram - Sequencing Control Explicit Next State
Microprogram counter Function Dispatch ROMs
- Logic Representation Logic Equations Truth Tables
- Implementation Technique PLA ROM
hardwired control
microprogrammed control