Title: The Processor: Datapath and Control
1The Processor Datapath and Control
CHAPTER 5 Part 2
2The processor datapath and control
- Instruction Execution Cycle
- We will see how to design the processor
3The Processor Datapath Control
- We're ready to look at an implementation of the
MIPS processor - Simplified to contain only
- memory-reference instructions lw, sw
- arithmetic-logical instructions add, sub, and,
or, slt - control flow instructions beq, j
- Generic Implementation
- use the program counter (PC) to supply
instruction address - get the instruction from memory
- read registers
- use the instruction to decide exactly what to do
4The Processor More Implementation Details
- Abstract / Simplified View of a
ProcessorTwo types of functional
units - elements that operate on data values
(combinational) e.g. ALU - elements that contain state (sequential) e.g.
Registers and Memory
5Overview of chapter 5
- State (sequential) Elements and storage mechanism
for registers. - Register File (reading and writing)
- Building a single cycle MIPS datapath to
accommodate - Instruction fetch
- R-type instructions
- lw/sw instructions
- beq instruction
- Control unit for a single cycle MIPS datapath
- Multicyle MIPS datapath
- Steps involved in executing an instruction
- Overview of design
6Combinational vs. Sequential Circuits
- Combinational circuits
- Output fully depends on the inputs.
- Applying the same inputs always produces the same
output. - E.g. Combinational circuits which just do
arithmetic and have no memory. - E.g. An ALU with a 3, b 2, Operation 10
(addition), Output will always be 5.
- Sequential (state) circuits
- Output depends on both inputs and state (memory).
- Same inputs can yield different outputs depending
on both input and state (memory). - State can also change with inputs!
- E.g. A register containing say 0010 0010 when
shifted and read. Input Shift command (same)
Output Read Different each time!
7State Elements (Sequential Elements)
- Unclocked vs. Clocked
- Clocks used in synchronous logic
- when should an element that contains state be
updated? (Possibilities Rising Edge, Falling
Edge, During Assertion, During Disassertion.)
8(Storing a bit) An unclocked state element
- The set-reset latch
- output depends on present inputs and also on past
inputs - If R 0, S 0 The value stored on the output Q
is recycled by inverting it to obtain Q and then
inverting Q to obtain Q and so on. The latch acts
as a storage device. - If R 0, S 1 The latch stores S 1 into Q
eventually and 0 into Q. - If R 1, S 0 The latch stores S 0 into Q
eventually and 1 into Q. - If R 1, S 1 The latch stores 0 into Q and 0
into Q (unacceptable!)
9Clocked state elements Latches / Flip-flops
- In Computer Applications, flip-flops and latches
are used to store data/state/signals. - A clocking methodology defines when data/signals
can be read and written - We wouldn't want to read a signal at the same
time it was being written - Output is equal to the stored value inside the
element(don't need to ask for permission to look
at the value) - Change of state (value) is based on the type of
component - Latches whenever the inputs change, and the
clock is asserted - Flip-flop state changes only on a clock
edge (edge-triggered methodology)
10D-latch
- Two inputs
- the data value to be stored (D)
- the clock signal (C) indicating when to read
store D - Two outputs
- the value of the internal state (Q) and it's
complement Q - When C 1, D-latch stores D as Q (can accept
data in the duration of C 1) - When C 0, D-latch keeps its internal state in Q
and Q
Q
D
D Latch
C
11D flip-flop State changes only on falling clock
edge
- Two inputs
- the data value to be stored (D)
- the clock signal (C) indicating when to read
store D - Two outputs
- the value of the internal state (Q) and it's
complement - Internal changes only on the clock edge (falling
edge). As soon as C becomes 1 D-flip-flop stores
D as Q. - At other times D-flop-flop keeps its internal
state in Q and Q - How would you implement a D flip flop that
changes state only at the rising edge?
12Our Implementation
- An edge triggered methodology (State elements
accept data only at the edge. - edge methodology is at rising or falling edge but
not both. - Typical execution
- read contents of some state elements,
- send values through some combinational logic
- write results to one or more state elements
13Register File
- An MIPS processor contains 32 registers. The
registers are grouped in a place called a
register file. A register file consists of a set
of registers that can be read or written by
supplying a register number to be accessed. - Registers are built using D flip-flopsImplementa
tion of a n1-Register file for reading purposes.
14Reading a Register File
- Consider an operation z x y with x 6, y 7
to be carried out by the MIPS add s2, s0, s1
to carry out s2 s0 s1. - Supply the addresses of s0 and s1 (16 and 17)
via Read register1, Read register2 and read the
register contents 6, and 7 via Read data1, Read
data2.
15Decoder
- An (n-1) decoder is a logical block that has
n-bits of inputs and up to 2n output where only
one output is asserted (enabled) for each input
combination. - Consider the 3-1 decoder below with inputs
a2a1a0 and outputs s7 s6 s5 s4 s3 s2 s1 s0 - What are the Boolean formulas each output? A
single product! - E.g. Assume Input a2a1a0, Then
input 011 selects output 3
0
0
0
0
1
3-1 decoder
3-1 decoder
1
0
1
0
0
0
16Writing into a Register File
- We will use a decoder to choose which register
should receive the data. - Note we still use the real clock to determine
when to write. - Example Show how to write 4 into register number
1.
17Writing into a Register File
- Inputs Register number, register data, write
signal. - Process Register number is decoded to select
(enable) the proper register to receive the data. - Proper register is enabled by the write signal.
- Register data is supplied.
- Proper register receives the register data.
18Reading and writing a Register File
- Consider an operation z x y with x 6, y 7
to be carried out by the MIPS add s2, s0, s1
to carry out s2 s0 s1. - Supply the addresses of s0 and s1 (16 and 17)
via Read register1, Read register2 and read the
register contents 6, and 7 via Read data1, Read
data2. - Write the result 13 (via Write data) into
register s2 by supplying the register address 18
(via Write register) with write signal 1.
19Building a datapath
- The program to be executed is first loaded into
the instruction memory. - Each instruction to be executed is fetched into
the datapath. - The address in the instruction memory of the
current instruction being executed is in the
program counter, PC. - This address in the PC is incremented by 4 using
the adder in preparation for the next
instruction.
20Instructions are fetched from Memory
- In this chapter we consider the datapath and
control and how they relate to memory. - Instruction execution is timed by a CPU clock.
The CPU's clock cycles run at a speed called the
processor speed. - Processor speed now days run in hundred thousands
to millions of times per second. e.g. 800 MHz.
800 Mega cycles per second. - Each instruction takes a few CPU cycles say 2-5
cycles. - Instructions in execution are stored in a part of
memory called Instruction Memory. - The address in the instruction memory of the
currently executed instruction is stored in the
Program Counter, PC. - To get the address of the next instruction, the
datapath adds 4 to PC. A specialized addition
machine called an adder is used for this purpose.
21Instructions are fetched from Memory
- Memory Implementation Memory is composed of
memory words. Each memory word 32 bits of
storage. Each storage bit is implemented
electronically using flip-flops or latches.
22Datapath Fetching instruction and adding 4 to
PC.
- Supply the address in PC to instruction memory.
- Read the instruction.
- Increment PC by 4 to get the next instruction
byte address.
23Datapath for executing R-type instructions
- Supply address of registers to be read via Read
register 1, 2 - Read the register contents via Read data 1, 2
- Direct ALU to do operation by supplying its ALU
operation - Store the result back into register file via
Write data at address Write register and enabling
RegWrite
24Executing R-type instructions
- Consider add s3, s1, s2
- Address for s1, and s2 supplied to memory
register via Read register1, Read register2. - Data is read from registers s1, and s2 via Read
data1, Read data2. - Data is added in the ALU.
- ALU result is written via Write data with the
help of the address of s3 via Write register to
s3. - All the above is regulated by sending control
signals at the proper time.
25Datapath for lw and sw
lw s1, 20(s2) same as s1 Memorys2
20 Decoded as op s2 s1 20
sw s1, 20(s2) same as Memorys2 20
s1 Decoded as op s2 s1 20
26Executing lw and sw instructions
- Consider lw s1, 20(s2)
- Address for s2 supplied to register file via
Read register1. - Data is read from register s2 via Read data1
(note this data is itself an address) - The 16-bit offset, 20 is sign-extended to 32
bits. - The result of s2 20 is obtained and used to
fetch data from the memory.
27Datapath for beq
28Executing a beq instruction
- beq t1, t2, offset
- if t1 t2 branch to offset else go to the
next instruction. Assume offset 4, then in the
memory to program instructions are as follows - beq t1, t2, offset
- --------------------
- --------------------
- --------------------
- --------------------
- offset --------------------
29Steps in executing a beq instruction
- Step 1 PC PC 4
- Step 2 Supply address of t1, t2 to Register
file via Read register1, Read register2 to get
the contents of t1, t2 via Read data1, Read
data2. - Step 3 Use ALU to determine if the values in
Read data1, Read data2 are equal (Zero 1) or
not equal (Zero 0). Zero is sent to branch
logic to determine when to branch. - Step 4 Sign extend the 16-bit offset to 32 bits.
Shift offset left by 2 bits (same as x 4 bytes).
Add offset to PC. - Step 5 If we are not branching, the control
logic replaces PC with previous value in Step 1.
30Building the Datapath
- Use multiplexors to stitch them together
31Building the Complete Datapath
- Share datapath elements among instruction
classes. - Allow multiple connections to an element.
- Component A to B and C Component A from B
and C(Split connection) (use an mux and
add control)
B
B
A
A
C
C
32Stages of Combining Components
- Combine R-type Mem. Ref unitsAdd 2 mux
(ALUSrc, MemToReg) - Add instruction fetch part Connect instruction
output - Add branch datapath Add PCSrc mux and split
common sources.
33Complete Datapath with all control lines
identified Single-cycle Datapath
- Calculate cycle time assuming negligible delays
except - memory (2ns), ALU and adders (2ns), register file
access (1ns)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37The Instruction classes (R-type, load, store,
branch)
- Can you figure out where each instruction section
of each instruction goes on the datapath?
38The effect of each of the seven control signals
39Control
- Selecting the operations to perform (ALU,
read/write, etc.) - Controlling the flow of data (multiplexor inputs)
- Information comes from the 32 bits of the
instruction - Example add t0, s1, s2 Instruction
Format -
- ALU's operation based on instruction type and
function code
40Control
- e.g., what should the ALU do with this
instruction - Example lw 1, 100(2) 35 2 1
100 op rs rt 16 bit offset - ALU control input 000 AND 001 OR 010 add 110
subtract 111 set-on-less-than - Why is the code for subtract 110 and not 011?
41Control
- Must describe hardware to compute 3-bit ALU
control input - given instruction type (input into ALUCntrol)
00 lw, sw (ALUs result to be subtraction) 01
beq (ALUs result to be Less)11 arithmetic
(ALUs result determined from instruction) - Also input into ALU Control function code for
arithmetic - Describe it using a truth table (can turn into
gates)
42Control
43Control
- Simple combinational logic (truth tables)
Main Control unit
ALU Control unit
44Improving on the Datapath Multi-cycle Datapath
- Single-cycle datapath is inefficient. Why?
- Five Execution Steps are
- Instruction Fetch
- Instruction Decode and Register Fetch
- Execution, Memory Address Computation, or Branch
Completion - Memory Access or R-type instruction completion
- Write-back step INSTRUCTIONS TAKE FROM 3 - 5
CYCLES!
45Step 1 Instruction Fetch
- Use PC to get instruction and put it in the
Instruction Register. - Increment the PC by 4 and put the result back in
the PC. - Can be described succinctly using RTL
"Register-Transfer Language" IR
MemoryPC PC PC 4
46Step 2 Instruction Decode and Register Fetch
- Read registers rs and rt in case we need them
- Compute the branch address in case the
instruction is a branch - RTL A RegIR25-21 B
RegIR20-16 ALUOut PC (sign-extend(IR15-
0) ltlt 2) - We aren't setting any control lines based on the
instruction type (we are busy "decoding" it in
our control logic)
47Step 3 (instruction dependent)
- ALU is performing one of three functions, based
on instruction type - Memory Reference ALUOut A
sign-extend(IR15-0) - R-type ALUOut A op B
- Branch if (AB) PC ALUOut
48Step 4 (R-type or memory-access)
- Loads and stores access memory MDR
MemoryALUOut or MemoryALUOut B - R-type instructions finish RegIR15-11
ALUOut Step 5 Write-back step - Load finishesRegIR20-16 MDR
49Summary
50Simple Questions
- How many cycles will it take to execute this
code? lw t2, 0(t3) lw t3, 4(t3) beq
t2, t3, Label assume not equal add t5, t2,
t3 sw t5, 8(t3)Label ... - Can you represent these instructions into
micro-operations?
51High level multi-cycle processor
52MIPS Multi-cycle processor without controls
53MIPS Multi-cycle processor with controls
54Chapter five Summary
- The Datapath and control can be designed based in
the instruction set architecture. - The datapath is composed in combinational units
(e.g. adder, ALU, mux) and sequential units such
as registers and memory. - Have considered mainly the single-cycle datapath
design and introduced multi-cycle datapath. - The control unit issues the right control signals
at the right time to enable complete execution of
an instruction. - The control design requires a through
understanding of the design. - Have only seen control design for single cycle
datapath. - Control design of multi-cycle datapath requires
finite state machine theory. - Datapath has mechanism for fetching next
instruction, and thus a executing a whole
program.