Title: Computer Organization Multi-cycle Approach
1 Computer Organization Multi-cycle Approach
- Dr. Iyad Jafar
- Adapted from Dr. Gheith Abandah slides
- http//www.abandah.com/gheith/Courses/CPE335_S08/i
ndex.html
2Multicycle Datapath Approach
- Let an instruction take more than 1 clock cycle
to complete - Break up instructions into steps where
- each step takes a cycle while trying to balance
the amount of work to be done in each step - restrict each cycle to use only one major
functional unit unless used in parallel - Not every instruction takes the same number of
clock cycles - In addition to faster clock rates, multicycle
allows functional units that can be used more
than once per instruction as long as they are
used on different clock cycles, as a result - Need one memory only but only one memory access
per cycle - Need one ALU/adder only but only one ALU
operation per cycle
3Multicycle Datapath Approach, cont
- At the end of a cycle
- Store values needed in a later cycle by the
current instruction in internal registers (A,B,
IR, and MDR) . These registers are invisible to
the programmer. - All of these registers, except IR, hold data only
between a pair of adjacent clock cycles thus they
dont need write control signal. - IR Instruction Register MDR Memory Data
Register - A, B regfile read data registers ALUout ALU
output register
- Data used by subsequent instructions are stored
in programmer visible registers (i.e., register
file, PC, or memory)
4Multicycle Datapath Approach, cont
- Similar to single cycle, shared functional
units should have multiplexers at their inputs.
- There is only one adder that will be used to
update PC, perform ALU operations, comparison for
beq, memory address computation, and branch
address computation.
5Multicycle Datapath Approach- Control Signals
6The Multicycle Datapath with Control Signals
PCWriteCond
PCWrite
PCSource
ALUOp
IorD
Control
MemRead
ALUSrcB
MemWrite
ALUSrcA
MemtoReg
RegWrite
IRWrite
RegDst
PC31-28
Instr31-26
Shift left 2
28
Instr25-0
2
0
1
Address
Memory
0
PC
Read Addr 1
0
A
Read Data 1
IR
Register File
1
zero
1
Read Addr 2
Read Data (Instr. or Data)
0
ALUout
ALU
Write Addr
Read Data 2
Write Data
1
B
0
Write Data
1
4
1
0
2
Sign Extend
Shift left 2
3
Instr15-0
ALU control
32
Instr5-0
7Multicycle Machine 1-bit Control Signals
Signal Effect when deasserted Effect when asserted
RegDst The destination register number comes from the rt field The destination register number comes from the rd field
RegWrite None Write is enabled to selected destination register
ALUSrcA The first ALU operand is the PC The first ALU operand is register A
MemRead None Content of memory address is placed on Memory data out
MemWrtite None Memory location specified by the address is replaced by the value on Write data input
MemtoReg The value fed to register file is from ALUOut The value fed to register file is from memory
IorD PC is used as an address to memory unit ALUOut is used to supply the address to the memory unit
IRWrite None The output of memory is written into IR
PCWrite None PC is written the source is controlled by PCSource
PCWriteCond None PC is written if Zero output from ALU is also active
8Multicycle Machine 2-bit Control Signals
Signal Value Effect
ALUOp 00 ALU performs add operation
ALUOp 01 ALU performs subtract operation
ALUOp 10 The funct field of the instruction determines the ALU operation
ALUSrcB 00 The second input to the ALU comes from register B
ALUSrcB 01 The second input to the ALU is 4 (to increment PC)
ALUSrcB 10 The second input to the ALU is the sign extended offset , lower 16 bits of IR.
ALUSrcB 11 The second input to the ALU is the sign extended , lower 16 bits of the IR shifted left by two bits
PCSource 00 Output of ALU (PC 4) is sent to the PC for writing
PCSource 01 The content of ALUOut are sent to the PC for writing (Branch address)
PCSource 10 The jump address is sent to the PC for writing
9Breaking Instruction Execution into Clock Cycles
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Dec
- 1. IFetch Instruction Fetch and Update PC (Same
for all instructions) - Operations
- 1.1 Instruction Fetch IR lt MemoryPC
- 1.2 Update PC PC lt PC 4
- Control signals values
- IorD 0 , MemRead 1 , IRWrite 1
- ALUSrcA 0, ALUSrcB 01, ALUOp 00, PCWrite
1 - PCSrc 00
-
-
-
10Breaking Instruction Execution into Clock Cycles
- 2. Decode - Instruction decode and register fetch
(same for all instructions) - We dont know the instruction yet, do non
harmful operations - Operations
- 2.1 read the two source registers rs and rt and
place them in registers A and B,
respectively. - A lt RegIR2521
- B lt RegIR2016
- 2.2 Compute the branch address
- ALUOut lt PC (sign-extend(IR150) ltlt2)
- Control signals values
- ALUSrcA 0, ALUSrcB 11, ALUOp 00
-
-
-
11Breaking Instruction Execution into Clock Cycles
- 3. Execution, Memory address computation, or
branch completion - Operation in this cycle depends on instruction
type - Operations
- if memory reference, compute address
- ALUOut lt A sign-extend(IR150)
- ALUSrcA 1, ALUSrcB 10, ALUOp 00
-
- if arithmetic-logic instruction, perform
operation - ALUOut lt A op B
- ALUSrcA 1, ALUSrcB 00, ALUOp 10
-
12Breaking Instruction Execution into Clock Cycles
- 3. Execution, Memory address computation, or
branch completion (continued) - operation depends on instruction type
- Operations
- if branch instruction
- if (A B) PClt ALUOut
- ALUSrcA 1, ALUSrcB 00, ALUOp 01,
PCWriteCond 1, PCSrc 01 - if jump instruction
- PC lt PC3128, (IR250,2b00)
- PCSource 10, PCWrite 1
13Breaking Instruction Execution into Clock Cycles
- 4. Memory access or R-type completion
- operation in this cycle depends on instruction
type - Operations
- if load instruction read value from memory
into MDR - MDR lt MemoryALUOut
- MemRead 1, IorD 1
- if store instruction store rt into memory
- MemoryALUOut lt B
- MemWrite 1, IorD 1
- if arithmetic-logical instruction write ALU
result into rd - RegIR1511 lt ALUOut
- MemtoReg 0, RegDst 1, RegWrite 1
-
-
14Breaking Instruction Execution into Clock Cycles
- 5. Memory read completion
- Needed for the load instruction only
- Operations
- 5.1 store the loaded value in MDR into rt
- RegIR2016 lt MDR
- RegWrite 1, MemtoReg 1, RegDst 0
-
-
15Breaking Instruction Execution into Clock Cycles
- In this implementation, not all instructions take
5 cycles -
-
Instruction Class Clock Cycles Required
Load 5
Store 4
Branch 3
Arithmetic-logical 4
Jump 3
16Multicycle Performance
- Compute the average CPI for multicycle
implementation for SPECINT2000 program which has
the following instruction mix 25 loads, 10
stores, 11 branches, 2 jumps, 52 ALU. Assume
the CPI for each instruction class as given in
the previous table - CPI S CPIi x ICi / IC
- 0.25 x 5 0.1 x 4 0.11 x 3 0.02 x 3
0.52 x 4 - 4.12
- Compare to CPI 1 for single cycle ?!!
- Assume CCM 1/5 CCS
- Then
- PerformanceM / PerformanceS (IC x 1 x CCS ) /
(IC x 4.12 x (1/5) CCS) - 1.21
- Multicycle is also cost-effective in terms of
hardware.
17Multicycle Control Unit
- Multicycle datapath control signals are not
determined solely by the bits in the instruction - e.g., op code bits tell what operation the ALU
should be doing, but not what instruction cycle
is to be done next - Since the instruction is broken into multiple
cycles, we need to know what we did in the
previous cycle(s) in order to determine the
current action - Must use a finite state machine (FSM) for control
- a set of states (current state stored in State
Register) - next state function (determined
by
current state and the input) - output function (determined by
current state
and the input)
18The States of the Control Unit
- 10 states are required in the FSM control
- The sequence of states is determined by five
steps of execution and the instruction
19The Control Unit
- Logic gates
- inputs present state opcode ? bits 10
- outputs control next state ? bits 20
- truth table size 210 rows x 20 columns
- ROM
- Can be used to implement the truth table above
(210 x 20 bit 20 Kbit) - Each location stores the control signals values
and the next state - Each location is addressable by the opcode and
next state value
20Micro-programmed Control Unit
- ROM implementation is vulnerable to bugs and
expensive especially for complex CPU. Size
increase as the number and complexity of
instructions (states) increases. - Use Microprogramming
- The next state value may not be sequential
- Generate the next state outside the storage
element - Each state is a microinstruction and the signals
are specified symbolically - Use labels for sequencing
21Sequencer
22Microprogram
- The microassembler converts the microcode into
actual signal values - The sequencing field is used along with the
opcode to determine the next state
23Multicycle Advantages Disadvantages
- Uses the clock cycle efficiently the clock
cycle is timed to accommodate the slowest
instruction step - Multicycle implementations allow functional units
to be used more than once per instruction as long
as they are used on different clock cycles - but
- Requires additional internal state registers,
more muxes, and more complicated (FSM) control
24Single Cycle vs. Multiple Cycle Timing