Title: Chapter Five Part 4: Implementing Multicycle Control
1Chapter FivePart 4 Implementing Multicycle
Control
2Implementing the Control
- Value of control signals is dependent upon
- what instruction is being executed
- which step is being performed
- Use the information weve accumulated to specify
a finite state machine - specify the finite state machine graphically, or
- use microprogramming
- Implementation can be derived from specification
3Defining control
- Finite State Machines
- set of states
- next-state function maps current state and
inputs to a new state - each state specifies set of outputs that are
asserted in that state - assume that any signal that is not asserted is
deasserted - must always specify the control signal going to a
mux
4Defining control
- finite state control corresponds to the 5 steps.
- each state in the FSM takes 1 clock cycle
- Since first two steps are common for all
instructions, the first two states in every FSM
are identical - After executing the last step for an instruction
class, FSM returns to the initial state to begin
fetching the next instruction - High level view of a FSM control. See next
slide.
5Defining control
6Defining control
- First two states of the FSM for all instructions.
See figure 5.37. - First state is 0
- Signals asserted in each state shown within the
circle representing the state. - Arcs between states labeled with conditions that
select a specific next state. -
- After state 1 next state depends on the
instruction type. - 4 arcs exiting from state 1 representing the 4
instruction types - This branching based on instruction type is
called decoding
7Defining control
8Defining control
9Defining control
10Defining control
11Defining control
12Graphical Specification of FSM
- How many state bits will we need?
Goto Truth Table
13Simple Questions
- How many cycles will it take to execute this
code? lw t2, 0(t3) lw t3, 4(t3) beq
t2, t3, Label assume not taken add t5,
t2, t3 sw t5, 8(t3)Label ... - What is going on during the 8th cycle of
execution? - In what cycle does the actual addition of t2 and
t3 takes place?
14Multi Cycle performance
- The clock cycle for each instruction type is
- Loads 5
- Stores 4
- R-format instructions 4
- Branches 3
- Jumps 3
- Assume the instruction mix is
- 22 loads
- 11 stores
- 49 R-format
- 16 branches
- 2 jumps
15Multi Cycle performance
- Average cycles per instruction (CPI)
- CPU clock cycles instruction
count i X CPIi - Instruction count Instruction Count
- instruction count i
- Instruction Count
- The ratio of instruction count i/ instruction
count is simply the instruction frequency for
the instruction class I (i.e., the frequency). - Thus the answer is just the sum of the
frequencies times their corresponding CPI - CPI 5 x 22 4 x 11 4 x 49 3 x 16 3 x
2 4.04 - Worst case CPI (all instructions take 5 clock
cycles) is 5.
X CPIi
16Finite State Machine for Control
- FSM implementation.
- temporary register holds the current state
- combinational logic block determines datapath
signals and next state.
17Finite State Machine for Control
18Finite State Machine for Control
- Expanded view of FSM implementation see next
slide - 10 states, need 4 bits to encode the state (S3,
S2, S1, S0). - Current state number is stored in a state
register. - Example state 0110 means S3 S2 S1 S0
- Control unit has outputs that specify the next
state. NS3, NS2, NS1, NS0
19Finite State Machine for Control
20Combinational Logic
- Two parts
-
- determining control signals...depends only on the
state bits -
- determining next state...depends on the current
state and opcode -
- The control function can be expressed as a logic
equation for each output. -
- two ways to implement
-
- complete truth table
-
- a two-level logic structure that allows a sparse
encoding of the truth table -
21Combinational Logic
- complete truth table implementation on next slide
- split control function into two part
- next-state outputs depend on all inputs
- control signal outputs depend only on
current-state bits - Logic Equations see table on next slide
- Column 2 contains the states in which the control
signal is active. - Get this information from the FSM
-
- Third column used to help determine next state.
-
- When a next state is active the bits NS3-0 are
set to the corresponding binary value. -
- The bits NS3-0 are active in multiple states,
so the equation for a bit is the OR of the states
in which it is active. -
- Must also AND with the appropriate opcode.
22Combinational Logic
Goto FSM
23Combinational Logic
24Creating truth tables next state
- From the preceding tables, we can create truth
tables for each next state bit. - The tables need only list the states in which the
bit is active.
25Creating truth tables next state
Truth table for the NS0 output which is active
when the next State is 1, 3, 5, 7, or 9. This
situation occurs when the current State is one
of 0, 2, 6, or 1
26Deriving equations low-order next-state bit NS0
- NS0 active in NextState1, NextState3, NextState5,
NextState7, NextState9. -
- Entries for these states in Figure C.8 supply
conditions when these next-state values are
active. -
NextState1 State0 -
- NextState3 State2 AND (Op5-0lw)
-
-
- NextState5 State2 AND (Op5-0 sw)
-
-
- NextState7 State6
-
- NextState9 State1 AND (Op5-0 jmp)
-
-
27Creating truth tables next state
Truth table for the NS2 output which is active
when the next State is 4, 5, 6, or 7. This
situation occurs when the current State is one
of 1, 2, 3, or 6
28Creating truth tables contol signals
- Same process as next state bits
- Do not need to consider the opcode, however
- First derive a truth table for each control
signal - Truth table need only list states for which the
control signal is asserted - Each signals truth table represents 64 entries
(all combinations of the 6 bits of the opcode
these are all dont cares).
29Combinational Logic states when the control
signal is active
PCWrite
IorD
MemRead
ALUSrcB1
Etc, etc, etc.
ALUSrc0
30Deriving equations control signals
- PCWrite S3S2S1S0 S3S2S1S0
- Etc.
31ROM Implementation
- ROM "Read Only Memory"
- values of memory locations are fixed ahead of
time - A ROM can be used to implement a truth table
- if the address is m-bits, we can address 2m
entries in the ROM. - our outputs are the bits of data that the address
points to. - m is the "height", and n is the "width"
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
32ROM Implementation
- How many inputs are there? 6 bits for opcode, 4
bits for state 10 address lines (i.e., 210
1024 different addresses) - How many outputs are there? 16 datapath-control
outputs, 4 state bits 20 outputs - ROM is 210 x 20 20K bits (and a rather
unusual size) - Rather wasteful, since for lots of the entries,
the outputs are the same i.e., opcode is often
ignored
33ROM implementation
- Break up the table into two parts 4 state bits
tell you the 16 outputs, 24 x 16 bits of
ROM 10 bits tell you the 4 next state bits,
210 x 4 bits of ROM Total 4.3K bits of ROM
34ROM vs PLA
- PLA is much smaller can share product terms
only need entries that produce an active
output can take into account don't cares - Size is (inputs product-terms) (outputs
product-terms) For this example
(10x17)(20x17) 460 PLA cells - PLA cells usually about the size of a ROM cell
(slightly bigger)
35PLA Implementation
- Top part is the AND plane
- Each dot represents a AND
- Bottom part is the OR plane
- each dot represents a OR
- Example
- First vertical line, AND plane
-
- First horizontal line, OR plane
- (represents PCWrite)
S3S2S1S0 S3S2S1S0
36Another Implementation Style using a sequencer
- If there are many states and if many states are
sequential, it is more efficient to use a counter
to supply the sequential next state. - Eliminates the need to encode the next-state
function explicitly in the control unit - Use an adder instead
- See next slide
- The incremented state is always the state that
follows in numerical order
37Another Implementation Style using a sequencer
- Complex instructions the "next state" is often
current state 1
38Another Implementation Style using a sequencer
- sometimes must branch
- example after state 1 there are 4 possible next
states. - Each control word must include opcode lines that
will determine how the next state is chosen. - implementing the control output signal portion
looks exactly like the previous truth table
39Another Implementation Style using a sequencer
- implementing the next state function
- control unit logic must only specify how to
choose the state when it is not the sequentially
following state. - Method 1 the control unit explicitly encodes
the next-state function. - CU need only set the next-state lines when the
designated next state is not the state that the
counter indicates. - If next-state function is mostly empty, resulting
CU will have much empty or redundant space.
40Another Implementation Style using a sequencer
- Method 2 use separate external logic to specify
the next state when counter does not specify the
state. - Most often used
- nonsequential next state will come from an
external table - The CU specifies when this occurs and how to find
the next state.
41Another Implementation Style using a sequencer
- Method 2 (continued). Two kinds of branching
-
- 1. Dispatch jump to one of a number of states
based on the opcode portion of the IR. - 2. Branch to state 0 initiates the execution
of the next MIPS instruction
42Another Implementation Style using a sequencer
- Method 2 (continued). Two kinds of branching
-
- 1. Dispatch jump to one of a number of states
based on the opcode portion of the IR. - implemented with a set of special ROMs included
as part of the address selection logic. - an additional set of control outputs, AddrCtl
indicates when a dispatch should be done - From FSM see that there are 2 states in which we
do a branch based on a portion of the opcode (see
FSM on next slide) - Thus need 2 small dispatch tables.
- Or could use single dispatch table and use the
control bits that select the table as address
bits that choose which portion of the dispatch
table to select the address from.
43Graphical Specification of FSM
- How many state bits will we need?
Goto Truth Table
44Another Implementation Style using a sequencer
- Method 2 (cont.)
- Dispatch (cont.)
-
- 4 ways to choose next state
- 3 types of branches
- incrementing current-state number
- encode in 2 bits
- AddrCtl value Action
- 0 set state to 0
- 1 Dispatch with ROM 1
- 2 Dispatch with ROM 2
- 3 Use the incremented state
-
45Details
46Details
47Using a sequencer
- Entire control ROM see figure C19.
- 10 control words, each 18 bits wide. Total of
180 bits. - 2 dispatch tables are 4 bits wide. Each has 64
entries. Total of 512 additional bits. - Total 692 bits.
- Implementation with 2 ROMs with next-state
function encoded in the ROMs 4.3Kbits. - Could encode dispatch tables more efficiently
with two small PLAs. - Could also replace the control ROM with a PLA.
48Using a sequencer ROM
Column 2 datapath control bits (same as
derived earlier) Column 3 address-control bits
49Optimizing the Control Implementation
- Use logic minimization (techniques like K-maps)
- Use state minimization. Assign state numbers
such that the resulting logic equations contain
more redundancy. - Example of state minimization
- In the FSM the signal RegWrite is active only in
states 4 and 7. - If we encoded those states as 8 and 9 could
rewrite the equation for RegWrite as a test on
bit S3 (which is only used in states 8 and 9). - can then combine the two truth table entries in
part o of figure C.9 into a single entry.
Eliminate one term in the CU. - Can do state minimization in an implementation
with an explicit program counter. Are more
restricted because must keep states sequential.