Title: Chapter Five Datapath and Control
1Chapter FiveDatapath and Control Part I
2The Processor Datapath Control
- We're ready to look at an implementation of the
MIPS - Simplified to contain only
- memory-reference instructions lw, sw
- arithmetic-logical instructions add, sub, and,
or, slt - control flow instructions beq, j
- Generic Implementation
- use the program counter (PC) to supply
instruction address - get the instruction from memory
- read registers
- use the instruction to decide exactly what to do
- All instructions use the ALU after reading the
registers Why? memory-reference? arithmetic?
control flow?
3More Implementation Details
- Abstract / Simplified View
- Two types of functional units
- elements that operate on data values
(combinational) - elements that contain state (sequential)
4State Elements
- Unclocked vs. Clocked
- Clocks used in synchronous logic
- when should an element that contains state be
updated?
cycle time
5An unclocked state element
- The set-reset latch
- output depends on present inputs and also on past
inputs
6Latches and Flip-flops
- Output is equal to the stored value inside the
element (don't need to ask for permission to
look at the value) - Change of state (value) is based on the clock
- Latches whenever the inputs change, and the
clock is asserted - Flip-flop state changes only on a clock
edge (edge-triggered methodology)
"logically true", could mean electrically low
A clocking methodology defines when signals can
be read and written wouldn't want to read a
signal at the same time it was being written
7D-latch
- Two inputs
- the data value to be stored (D)
- the clock signal (C) indicating when to read
store D - Two outputs
- the value of the internal state (Q) and it's
complement
8D flip-flop
- Output changes only on the clock edge
9Our Implementation
- An edge triggered methodology
- Typical execution
- read contents of some state elements,
- send values through some combinational logic
- write results to one or more state elements
10Register File
11Register File
- Note we still use the real clock to determine
when to write
12Simple Implementation
- Include the functional units we need for each
instruction
Why do we need this stuff?
13Building the Datapath- Fetching Instruction
- To execute an instruction, we start by fetching
the instruction from the memory - To prepare for executing the next instruction, we
must increment the program counter (PC) by 4
14Building the Datapath- Fetching Instruction
- To execute an instruction, we start by fetching
the instruction from the memory - To prepare for executing the next instruction, we
must increment the program counter (PC) by 4
15Building the Datapath- R-type Instructions
- Include instructions add, sub, slt, and, or
- The register numbers come from fields of the
instruction
- Example add t0, s1, s2
- 000000 10001 10010 01000 00000 100000 op
rs rt rd shamt funct
16Building the Datapath- R-type Instructions
- Include instructions add, sub, slt, and, or
- The register numbers come from fields of the
instruction
- Example add t0, s1, s2
- 000000 10001 10010 01000 00000 100000 op
rs rt rd shamt funct
17Building the Datapath- load and store Instructions
- The instructions compute a memory address by
adding the base register (s2) to the 16-bit
signed offset field (32) contained in the
instruction
- Example lw t0, 32(s2) 35 18 9
32 op rs rt 16 bit number
18Building the Datapath- load and store Instructions
- The instructions compute a memory address by
adding the base register (s2) to the 16-bit
signed offset field (32) contained in the
instruction
- Example lw t0, 32(s2) 35 18 9
32 op rs rt 16 bit number
19Building the Datapath- branch Instructions
- When the condition is true, the branch target
address becomes the new PC - Otherwise, the incremented PC should replace the
current PC - Hence, we need compute the branch target address,
and compare the register contents
- Instructions
- bne t4,t5,Label Next instruction is at Label if
t4?t5 - beq t4,t5,Label Next instruction is at Label if
t4t5
20Building the Datapath- branch Instructions
- When the condition is true, the branch target
address becomes the new PC - Otherwise, the incremented PC should replace the
current PC - Hence, we need compute the branch target address,
and compare the register contents
- Instructions
- bne t4,t5,Label Next instruction is at Label if
t4?t5 - beq t4,t5,Label Next instruction is at Label if
t4t5
21Building the Datapath
- Single cycle
- Use multiplexors to stitch them together
22Control
- Selecting the operations to perform (ALU,
read/write, etc.) - Controlling the flow of data (multiplexor inputs)
- Information comes from the 32 bits of the
instruction - Example add 8, 17, 18 Instruction
Format 000000 10001 10010 01000
00000 100000 op rs rt rd shamt
funct - ALU's operation based on instruction type and
function code
23Control
- e.g., what should the ALU do with this
instruction - Example lw 1, 100(2) 35 2 1
100 op rs rt 16 bit offset - ALU control input 0000 AND 0001 OR 0010 add
0110 subtract 0111 set-on-less-than 1100 NOR
24Control
- Must describe hardware to compute 4-bit ALU
control input - given instruction type 00 lw, sw 01 beq,
10 arithmetic - function code for arithmetic
- Describe it using a truth table (can turn into
gates)
25Control
- For a load, destination register is bits 20-16
- For R-type instruction, destination register is
bits 15-11 - We need a mux to select the field of instruction
to indicate the register number to be written
26 27Control
- Simple combinational logic (truth tables)
ALU control
Control
28Operation of the Datapath
- R-type instruction add t1, t2, t3
- An instruction is fetched from the instruction
memory and the PC is incremented by 4 - Two registers, t2 and t3, are read from the
register file. The main control unit computes the
setting of the control lines - The ALU operates on the data read from the
register file, using the function code (bits 5-0)
to generate the ALU function - The result from the ALU is written into the
register file using bits 15-11 of the instruction
to select the destination register t1
29 30Operation of the Datapath
- Load instruction lw t1, offset(t2)
- An instruction is fetched from the instruction
memory and the PC is incremented by 4 - A register t2 value is read from the register
file - The ALU computes the sum of the value read from
the register file and the sign-extended, lower 16
bits of the instruction (offset) - The sum from the ALU is used as the address for
the data memory - The data from the memory unit is written into the
register file
31 32Operation of the Datapath
- Branck-on-equal instruction beq t1,t2,offset
- An instruction is fetched from the instruction
memory and the PC is incremented by 4 - Two registers, t1 and t2, are read from the
register file - The ALU performs the subtraction on the data
values read from the register file. The value of
PC 4 is added to the sign-extended, lower 16
bits of the instruction (offset) shifted left by
two the result is the branch target address. - The Zero result from the ALU is used to decide
which adder result to store into the PC
33 34Adding New Instruction
- Jump instruction j Label
- fields 000010 address
- bit position 3126 250
- An instruction is fetched from the instruction
memory and the PC is incremented by 4 - The upper 4 bits of PC 4, the 26-bit immediate
field (address), and 00 are concatenated. - The concatenation result is stored into the PC
35 36Our Simple Control Structure
- All of the logic is combinational
- We wait for everything to settle down, and the
right thing to be done - ALU might not produce right answer right away
- We use write signals along with clock to
determine when to write - Cycle time determined by length of the longest
path
We are ignoring some details like setup and hold
times
37Single Cycle Implementation
- Calculate cycle time assuming negligible delays
except - memory (200ps), ALU and adders (100ps),
register file access (50ps)
38Example
- The clock cycle is determined by the longest
possible path - Usually the instruction lw uses five functional
units in series the instruction memory, the
register file, the ALU, the data memory, and the
register file - A significant portion of the clock cycle will be
wasted for add, sub, etc.
Assuming the following instruction mix 25
loads, 10 stores, 45 R-format instructions, 15
branches, and 5 jumps, compare the performance
for two implementations Implementation 1 every
instruction operates in 1 clock cycle of a fixed
length. Implementation 2 every instruction
executes in 1 clock cycle using a variable-length
clock, which for each instruction is only as long
as it needs to be.
39Example
CPU execution time Instruction count x CPI x
Clock cycle time Where CPI is 1 for both
implementations. CPU clock cycle for
implementation 2 600 x 25 550 x 10 400
x 45 350 x 15 200 x 5 447.5 ps (CPU
performance 2) / (CPU performance 1) (CPU
execution time 1) / (CPU execution time 2) (CPU
clock cycle 1) / (CPU clock cycle 2) 600/447.5
1.34
Instruction Class Instruction memory Register read ALU operation Data memory Register write Total
R-format 200 50 100 0 50 400 ps
Load word 200 50 100 200 50 600 ps
Store word 200 50 100 200 550 ps
Branch 200 50 100 0 350 ps
Jump 200 200 ps
40Where we are headed
- Single Cycle Problems
- Performance
- Clock cycle is equal to the worst-case delay for
all instructions ? Violating making common fast
principle - What if we had a more complicated instruction
like floating point? - Hardware cost
- Some functional units must be duplicated
- Wasteful of area
- One Solution
- Use a smaller cycle time
- Have different instructions take different
numbers of cycles - Multicycle datapath