EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor - PowerPoint PPT Presentation

About This Presentation
Title:

EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor

Description:

EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor – PowerPoint PPT presentation

Number of Views:166
Avg rating:3.0/5.0
Slides: 51
Provided by: ufl74
Category:

less

Transcript and Presenter's Notes

Title: EEL-4713 Computer Architecture Designing a Multiple-Cycle Processor


1
EEL-4713Computer ArchitectureDesigning a
Multiple-Cycle Processor
2
Outline of todays lecture
  • Recap and Introduction
  • Introduction to the Concept of Multiple Cycle
    Processor
  • Multiple Cycle Implementation of R-type
    Instructions
  • What is a Multiple Cycle Delay Path and Why is it
    Bad?
  • Multiple Cycle Implementation of Or Immediate
  • Multiple Cycle Implementation of Load and Store
  • Putting it all Together

3
Abstract view of our single cycle processor
Main Control
op
ALU control
fun
ALUSrc
Equal
ExtOp
MemWr
MemWr
MemRd
RegDst
RegWr
nPC_sel
ALUctr
Reg. Wrt
ALU
Register Fetch
Ext
Mem Access
PC
Instruction Fetch
Next PC
Result Store
Data Mem
  • looks like an FSM with PC as state

4
Whats wrong with our CPI1 processor?
Arithmetic Logical
Reg File
PC
Reg File
Inst Memory
ALU
mux
mux
Load
Reg File
PC
Inst Memory
ALU
Data Mem
Reg File
mux
mux
Critical Path
Store
PC
Inst Memory
ALU
Data Mem
Reg File
mux
Branch
PC
Inst Memory
cmp
Reg File
mux
  • Long Cycle Time
  • All instructions take as much time as the slowest
  • Real memory is not so nice as our idealized
    memory
  • cannot always get the job done in one (short)
    cycle

5
Drawbacks of this single cycle processor
  • Long cycle time
  • Cycle time is much longer than needed for all
    other instructions. Examples
  • R-type instructions do not require data memory
    access
  • Jump does not require ALU operation nor data
    memory access
  • Need for multiple functional units
  • Cant share functional units for multiple
    operations in the same instruction
  • E.g., instruction/data memory, adders (PC, ALU,
    branch target, etc.)

6
Overview of a multiple cycle implementation
  • The root of the single cycle processors
    problems
  • The cycle time has to be long enough for the
    slowest instruction
  • Solution
  • Break the instruction into smaller steps
  • Execute each step (instead of the entire
    instruction) in one cycle
  • Cycle time time it takes to execute the longest
    step
  • All the steps have similar length
  • This is the essence of the multiple cycle
    processor
  • The advantages of the multiple cycle processor
  • Cycle time is much shorter
  • Different instructions take different number of
    cycles to complete (for now)
  • Load takes five cycles
  • Jump only takes three cycles
  • Allows a functional unit to be used more than
    once per instruction

7
What and when
  • When designing multi-cycle implementations, you
    must think about
  • What to do on each cycle
  • When results are ready for next cycle
  • What to do on each cycle
  • Always need to fetch instruction
  • Always need to decode instruction (know what to
    do next)
  • Next need to perform actual operation (varies
    from instruction to instruction)
  • E.g.
  • Load will require address calculation, memory
    read, reg write
  • Branch will require comparison and PC update
  • R-type will require ALU operation, reg write

8
Overview Control State Diagram
Ifetch
Rfetch/Decode
BrComplete
Fetch and
store in IR PCPC4
Decode
Finish branch
beq
operation
Calculate the
AdrCal
Target address
Index registers
ALU Computes rsimm
Select mux, Write PC
Register operands
In busA, busB
lw or sw
Ori
Rtype
OriExec
lw
sw
RExec
SWMem
ALU has inputs and control set, can compute
LWmem
ALU has inputs and control set, can compute
Read mem. with address calculated above
Write rt to memory address calc. above
OriFinish
Rfinish
Latch ALU computed result in rd
Latch ALU computed result in rt
LWwr
Latch data read from memory in rt
9
Example the five steps of a Load instruction
Instruction Fetch
Instr Decode /
Address
Reg Wr
Data Memory
Reg. Fetch
Clk
Clk-to-Q
New Value
Old Value
PC
Instruction Memory Access Time
Rs, Rt, Rd, Op, Func
Old Value
New Value
Delay through Control Logic
ALUctr
Old Value
New Value
ExtOp
Old Value
New Value
ALUSrc
Old Value
New Value
RegWr
Old Value
New Value
Register File Access Time
busA
Old Value
New Value
Delay through Extender Mux
Register File Write Time
busB
Old Value
New Value
ALU Delay
Address
Old Value
New Value
Data Memory Access Time
busW
Old Value
New
10
Multicycle datapath
  • Similar to the single-cycle datapath use latches
    for instruction and branch target address
  • Control signals generated for multiple clock
    cycles per instruction - FSM

PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd

ExtOp1
11
Overview Control State Diagram
Ifetch
Rfetch/Decode
BrComplete
Fetch and
store in IR PCPC4
Decode
Finish branch
beq
operation
Calculate the
AdrCal
Target address
Index registers
ALU Computes rsimm
Select mux, Write PC
Register operands
In busA, busB
lw or sw
Ori
Rtype
OriExec
lw
sw
RExec
SWMem
ALU has inputs and control set, can compute
LWmem
ALU has inputs and control set, can compute
Read mem. with address calculated above
Write rt to memory address calc. above
OriFinish
Rfinish
Latch ALU computed result in rd
Latch ALU computed result in rt
LWwr
Latch data read from memory in rt
12
Cycle 1 Fetch
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
13
1 Instruction fetch cycle beginning
  • Every cycle begins right AFTER the clock tick
  • memPC PClt310gt 4

Clk
One Logic Clock Cycle
You are here!
Use the main ALU
PCWr?
PC
32
MemWr?
IRWr?
32
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
WrAdr
32
Dout
Din
32
ALUop?
32
Clk
14
1 Instruction fetch cycle end
  • Every cycle ends AT the next clock tick (storage
    element updates)
  • IR lt-- memPC PClt310gt lt-- PClt310gt 4

Clk
One Logic Clock Cycle
You are here!
PCWr1
PC
32
MemWr0
IRWr1
32
00
32
RAdr
Clk
4
32
Ideal Memory
Instruction Reg
32
WrAdr
Dout
Din
ALUOp Add
32
32
Clk
15
1 Instruction Fetch Cycle Overall Picture
  1. Latch IRmemPC
  2. Set PCPC4

Fetch and
store in IR PCPC4
PCWr1
PCWrCondx
PCSrc0
BrWr0
Zero
ALUSelA0
MemWr0
IRWr1
IorD0
1
Mux
32
PC
0
32
Zero
RAdr
32
32
busA
Ideal Memory
32
Instruction Reg
32
4
0
32
WrAdr
32
1
32
32
Din
Dout
32
busB
2
32
3
ALUSelB00
ALUOpAdd
16
Cycle 2 Register fetch, decode
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
17
2 Register Fetch / Instruction Decode
  • busA lt- RegFilers busB lt- RegFilert
  • ALU can be also be used to compute branch target
    address (next slide) in this cycle (latch
    target) register compare done on next cycle

PCWr0
PCWrCond0
PCSrcx
Zero
ALUSelAx
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Op
Go to the Control
Imm
6
ALUSelBxx
Func
16
6
ALUOpxx
18
2 Register Fetch / Instruction Decode (Continue)
  • busA lt- Regrs busB lt- Regrt
  • Target lt- PC SignExt(Imm16)4
  • Generate control signals

Decode
operation
Index registers
Register operands
In busA, busB
PCWr0
PCWrCond0
PCSrcx
BrWr1
Zero
ALUSelA0
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Control
Beq
Op
Imm
Rtype
6
ALUSelB10
Func
Ori
16
32
6
Memory
ALUOpAdd

ExtOp1
19
Cycle 3 Branch completion
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
20
3 Branch Completion
Finish branch
  • if (busA busB)
  • PC lt- Target

Calculate the
Target address
Select mux, Write PC
PCWr0
PCWrCond1
PCSrc1
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
32
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
ALUSelB01
16
32
ALUOpSub
ExtOpx
21
Cycle 3 Rtype execution
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
22
3 R-type Execution
ALU has inputs and control set, can compute
  • ALU Output lt- busA op busB

Jump
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst1
IorDx
1
JumpAddr
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Funct
Imm
32
ALUOpRtype
MemtoRegx
ExtOpx
ALUSelB01
23
Cycle 4 Rtype completion
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
24
4 R-type Completion
Latch ALU computed result in rd
  • Rrd lt- ALU Output

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst1
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpRtype
MemtoReg0
ExtOpx
ALUSelB01
25
Cycle 3 Ori execution
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
26
3 Ori Execution
  • ALU output lt- busA or ZeroExtImm16

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpOr
MemtoRegx
ExtOp0
ALUSelB11
27
Cycle 4 Ori completion
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
28
4 Ori Completion
Latch ALU computed result in rt
  • Regrt lt- ALU output

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpOr
MemtoReg0
ExtOp0
ALUSelB11
29
Cycle 3 Address calculation
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
30
3 Memory Address Calculation
AdrCal
ALU Computes rsimm
1 ExtOp
ALUSelA
ALUSelB11
  • ALU output lt- busA SignExtImm16

ALUOpAdd
x MemtoReg
PCSrc
PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
31
Cycle 4 Memory access, store
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
32
4 Memory Access for Store
Write rt to memory address calc. above
SWmem
1 ExtOp
MemWr
ALUSelA
ALUSelB11
  • memALU output lt- busB

ALUOpAdd
x PCSrc,RegDst
PCWr0
PCWrCond0
PCSrcx
BrWr0
MemtoReg
Zero
ALUSelA1
MemWr1
IRWr0
RegWr0
RegDstx
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
33
Cycle 4 Memory access, load
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
34
4 Memory Access for Load
Read mem. with address calculated above
  • Mem Dout lt- memALU output

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr0
RegDst0
IorD1
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoRegx
ExtOp1
ALUSelB11
35
Cycle 4 Memory access, load
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
36
5 Write Back for Load
Latch data read from memory in rt
  • Regrt lt- Mem Dout

PCWr0
PCWrCond0
PCSrcx
BrWr0
Zero
ALUSelA1
MemWr0
IRWr0
RegWr1
RegDst0
IorDx
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
16
32
ALUOpAdd
MemtoReg1
ExtOp1
ALUSelB11
37
Putting it all together Multiple Cycle Datapath
PCWr
PCWrCond
PCSrc
BrWr
Zero
ALUSelA
MemWr
IRWr
RegWr
RegDst
IorD
1
Mux
32
PC
0
32
Zero
Rs
Ra
RAdr
5
32
32
Rt
Rb
busA
32
Ideal Memory
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
WrAdr
32
1
32
Rd
Din
Dout
busW
32
busB
2
32
3
Imm
32
ALUOp
MemtoReg
ExtOp
ALUSelB
38
Putting it all together Control State Diagram
beq
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
Rtype
OriExec
lw
sw
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
MemtoReg
OriFinish
LWwr
39
Note there is a multiple-cycle delay path
  • There is no register to save the results between
  • 2) Register Fetch busA lt- Regrs busB lt-
    Regrt
  • 3) R-type Execution ALU output lt- busA op busB
  • 4) R-type Completion Regrd lt- ALU output

Register here to save outputs of Rfetch?
ALUselA
IRWr
Register here to save outputs of RExec?
Zero
Rs
Ra
5
busA
32
Rt
Rb
32
Instruction Reg
Reg File
5
32
4
Rt
0
32
Rw
1
32
Rd
busW
32
busB
2
3
ALUselB
ALUOp
40
A Multiple Cycle Delay Path (Continue)
  • Register is NOT needed to save the outputs of
    Register Fetch
  • IRWr 0 busA and busB will not change after
    Register Fetch
  • Register is NOT needed to save the outputs of
    R-type Execution
  • busA and busB will not change after Register
    Fetch
  • Control signals ALUSelA, ALUSelB, and ALUOpwill
    not change after R-type Execution
  • Consequently ALU output will not change after
    R-type Execution
  • In theory, you need a register to hold a signal
    value if
  • (1) The signal is computed in one clock cycle and
    used in another.
  • (2) AND the inputs to the functional block that
    computes this signal can change before the
    signal is written into a state element.
  • You can save a register if Cond 1 is true BUT
    Cond 2 is false
  • But in practice, this will introduce a multiple
    cycle delay path
  • A logic delay path that takes multiple cycles to
    propagate from one storage element to the next
    storage element

41
Pros and Cons of a Multiple Cycle Delay Path
  • A 3-cycle path example
  • IR (storage) -gt Reg File Read -gt ALU -gt Reg
    File Write (storage)
  • Advantages
  • Register savings
  • We can share time among cycles
  • If ALU takes longer than one cycle, still OK as
    longas the entire path takes less than 3 cycles
    to finish

42
Pros and Cons of a Multiple Cycle Delay Path
(Continue)
  • Disadvantage
  • Static timing analyzer, which ONLY looks at delay
    between two storage elements, will report this as
    a timing violation
  • You have to ignore the static timing analyzers
    warnings

43
Summary
  • Disadvantages of the Single Cycle Processor
  • Long cycle time
  • Cycle time is too long for all instructions
    except the Load
  • Multiple Cycle Processor
  • Divide the instructions into smaller steps
  • Execute each step (instead of the entire
    instruction) in one cycle
  • Do NOT confuse Multiple Cycle Processor with
    multiple cycle delay path
  • Multiple Cycle Processor executes
    eachinstruction in multiple clock cycles
  • Multiple Cycle Delay Path a combinational logic
    path between two storage elements that takes more
    than one clock cycle to complete
  • It is possible (desirable) to build a MC
    Processor without MCDP
  • Use a register to save a signals value whenever
    a signal is generated in one clock cycle and used
    in another cycle later

44
Control logic
  • Review of Finite State Machine (FSM) control
  • From Finite State Diagrams to Microprogramming

45
Overview
  • Control may be designed using one of several
    initial representations. The choice of sequence
    control, and how logic is represented, can then
    be determined independently the control can then
    be implemented with one of several methods using
    a structured logic technique.
  • Initial Representation Finite State Diagram
    Microprogram
  • Sequencing Control Explicit Next State
    Microprogram counter Function Dispatch ROMs
  • Logic Representation Logic Equations Truth Tables
  • Implementation Technique PLA ROM

hardwired control
microprogrammed control
46
Initial Representation Finite State Diagram
0
1
8
beq
2
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
10
Rtype
OriExec
lw
sw
6
3
5
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
11
MemtoReg
OriFinish
7
4
LWwr
9 Jump See Fig C.3.1
47
Sequencing Control Explicit Next State Function
O u t p u t s
Control Logic and Next State Logic
Multicycle Datapath
Inputs
State Reg
  • Next state number is encoded just like datapath
    controls

48
Interface in detail
49
Logic Representation Logic Equations
  • Alternatively, prior state condition
  • S4, S5, S7, S8, S9, S11 -gt State0
  • State 0_____________ -gt State 1
  • State 1 op lwsw -gt State 2
  • State2 op lw ____ -gt State 3
  • State 3 ____________ -gt State 4
  • State2 op sw ____ -gt State 5
  • State 1 op R-type -gt State 6
  • State 6 _____________-gt State 7
  • State 1 op beq___ -gt State 8
  • State2 op jmp ___-gt State 9
  • State 1 op ORi__ -gt State 10
  • State 10 __________ -gt State 11
  • Next state from current state
  • State 0 -gt State1
  • State 1 -gt S2, S6, S8, S10
  • State 2 -gt S3, S5
  • State 3 -gt State 4
  • State 4 -gtState 0
  • State 5 -gt State 0
  • State 6 -gt State 7
  • State 7 -gt State 0
  • State 8 -gt State 0
  • State 9-gt State 0
  • State 10 -gt State 11
  • State 11 -gt State 0

See Fig. C.3.3
50
Implementation Technique Programmed Logic Arrays
  • Each output line the logical OR of logical AND
    of input lines or their complement AND minterms
    specified in top AND plane, OR sums specified in
    bottom OR plane

Op5
R 000000 beq 000100 lw 100011 sw
101011 ori 001011 jmp 000010
Op4
Op3
Op2
Op1
Op0
Opany
S0
6 0110 7 0111 8 1000 9 1001 10 1010 11
1011
0 0000 1 0001 2 0010 3 0011 4 0100 5
0101
Next1
51
Implementation Technique Programmed Logic Arrays
  • Each output line the logical OR of logical AND of
    input lines or their complement AND minterms
    specified in top AND plane, OR sums specified in
    bottom OR plane

lw 100011 sw 101011 R 000000 ori
001011 beq 000100 jmp 000010
Op5
Op4
Op3
Op2
Op1
Op0
0 0000 1 0001 2 0010 3 0011 4 0100 5
0101
6 0110 7 0111 8 1000 9 1001 10 1010 11
1011
52
Multicycle Control
  • Given numbers assigned to FSM, can in turn
    determine next state as function of the inputs
    and current state
  • Can turn these into Boolean equations for each
    bit of the next state lines
  • Implement easily using PLA (programmable logic
    array) or ROM storing truth tables
  • See Figs. C.3.6 and C.3.8 for tables showing
    outputs and next state as function of current
    state and opcode
  • What if many more states, many more conditions?
  • State machine gets too large very large
    ROMs/PLAs
  • What if need to add a state?
  • May need to increase address for ROM, number of
    inputs for PLA gates
  • Or just implement FSM in VHDL

53
Next Iteration Using Sequencer for Next State
  • Before Explicit Next State Next try variation 1
    step from right hand side
  • Few sequential states in small FSM suppose added
    floating point?
  • Still need to go to non-sequential states e.g.,
    state 1 gt 2, 6, 8, 10
  • Initial Representation Finite State Diagram
    Microprogram
  • Sequencing Control Explicit Next State
    Microprogram counter Function Dispatch ROMs
  • Logic Representation Logic Equations Truth Tables
  • Implementation Technique PLA ROM

hardwired control
microprogrammed control
54
Sequencer-based control unit
Control Logic
Multicycle Datapath
Outputs
Inputs
A processor on its own Set state to 0
Dispatch (state 1 2) Incremented state number
(microPC) and micro branches specify whether
to increment or select another state based
on opcode
1
Adder
Address Select Logic
55
Sequencer-based control unit
Control Logic
Multicycle Datapath
Outputs
Inputs
AddrCtrl control for select logic 0 reset state
register (next instruction) 1 select next using
next-state ROM for state 1 2 select next using
next-state ROM for state 2 3 increment state
1
Adder
Address Select Logic
Fig. C.4.2
56
Sequencer block diagram
Before 6bit opcode 4-bit state -gt 4-bit
NS Now 4-bit state -gt 2-bit AddrCtrl
0000 AddrCtrl3 (fetch) 0001 AddrCtrl1
(decode) 0010 AddrCtrl2 (lw/sw) 0011
AddrCtrl3 (lw) 0100 AddrCtrl0 (lw) 0101
AddrCtrl0 (sw) 0110 AddrCtrl3 (r-type) 0111
AddrCtrl0 (r-type) 1000 AddrCtrl0
(branch) 1010 AddrCtrl3 (ori) 1011
AddrCtrl0 (ori) Dispatch ROM 1 (Indexed by
opcode) lw -gt 0010 sw -gt 0010 R-type -gt 0110 ori
-gt 1010 branch -gt 1000 ROM2 lw -gt 0011 sw -gt
0101
57
Initial Representation Finite State Diagram
0
1
8
beq
2
AdrCal
1 ExtOp
ALUSelA
ALUSelB11
lw or sw
ALUOpAdd
x MemtoReg
Ori
PCSrc
10
Rtype
OriExec
lw
sw
6
3
5
SWMem
LWmem
1 ExtOp
1 ExtOp
ALUSelA, IorD
MemWr
ALUSelB11
ALUSelA
ALUOpAdd
ALUSelB11
ALUOpAdd
x MemtoReg
PCSrc
x PCSrc,RegDst
11
MemtoReg
OriFinish
7
4
LWwr
9 Jump See Fig C.3.1
58
Next Iteration Using Microprogram for
Representation
  • Initial Representation Finite State Diagram
    Microprogram
  • Sequencing Control Explicit Next State
    Microprogram counter Function Dispatch ROMs
  • Logic Representation Logic Equations Truth Tables
  • Implementation Technique PLA ROM
  • ROM can be thought of as a sequence of control
    words
  • Control word can be thought of as instruction
    microinstruction

hardwired control
microprogrammed control
59
Microprogramming
  • Control is the hard part of processor design
  • Datapath is fairly regular and well-organized
  • Memory is highly regular
  • Control is irregular and global

Microprogramming -- A particular strategy for
implementing the control unit of a processor
by "programming" at the level of register
transfer operations Microarchitecture --
Logical structure and functional capabilities of
the hardware as seen by the microprogrammer
60
Macroinstruction Interpretation
User program plus Data this can change!
Main Memory
ADD SUB AND
. . .
one of these is mapped into one of these
DATA
execution unit
AND microsequence e.g., Fetch Calc
Operand Addr Fetch Operand(s)
Calculate Save Answer(s)
control memory
CPU
61
Microprogramming Pros and Cons
  • Ease of design
  • Flexibility
  • Easy to adapt to changes in organization, timing,
    technology
  • Can make changes late in design cycle, or even in
    the field
  • Can implement very powerful instruction sets
    (just more control memory)
  • Generality
  • Can implement multiple instruction sets on same
    machine.
  • Can tailor instruction set to application.
  • Compatibility
  • Many organizations, same instruction set
  • Costly to implement
  • Slow

62
Summary Multicycle Control
  • Microprogramming and hardwired control have many
    similarities, perhaps biggest difference is
    initial representation and ease of change of
    implementation, with ROM generally being easier
    than PLA
  • Initial Representation Finite State Diagram
    Microprogram
  • Sequencing Control Explicit Next State
    Microprogram counter Function Dispatch ROMs
  • Logic Representation Logic Equations Truth Tables
  • Implementation Technique PLA ROM

hardwired control
microprogrammed control
Write a Comment
User Comments (0)
About PowerShow.com