Chapter Five Datapath and Control - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Chapter Five Datapath and Control

Description:

Unclocked vs. Clocked. Clocks used in synchronous logic ... state (value) is based on the clock. Latches: whenever the inputs change, and the clock is asserted ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 41
Provided by: toda9
Category:

less

Transcript and Presenter's Notes

Title: Chapter Five Datapath and Control


1
Chapter FiveDatapath and Control Part I
2
The Processor Datapath Control
  • We're ready to look at an implementation of the
    MIPS
  • Simplified to contain only
  • memory-reference instructions lw, sw
  • arithmetic-logical instructions add, sub, and,
    or, slt
  • control flow instructions beq, j
  • Generic Implementation
  • use the program counter (PC) to supply
    instruction address
  • get the instruction from memory
  • read registers
  • use the instruction to decide exactly what to do
  • All instructions use the ALU after reading the
    registers Why? memory-reference? arithmetic?
    control flow?

3
More Implementation Details
  • Abstract / Simplified View
  • Two types of functional units
  • elements that operate on data values
    (combinational)
  • elements that contain state (sequential)

4
State Elements
  • Unclocked vs. Clocked
  • Clocks used in synchronous logic
  • when should an element that contains state be
    updated?

cycle time
5
An unclocked state element
  • The set-reset latch
  • output depends on present inputs and also on past
    inputs

6
Latches and Flip-flops
  • Output is equal to the stored value inside the
    element (don't need to ask for permission to
    look at the value)
  • Change of state (value) is based on the clock
  • Latches whenever the inputs change, and the
    clock is asserted
  • Flip-flop state changes only on a clock
    edge (edge-triggered methodology)

"logically true", could mean electrically low
A clocking methodology defines when signals can
be read and written wouldn't want to read a
signal at the same time it was being written
7
D-latch
  • Two inputs
  • the data value to be stored (D)
  • the clock signal (C) indicating when to read
    store D
  • Two outputs
  • the value of the internal state (Q) and it's
    complement

8
D flip-flop
  • Output changes only on the clock edge

9
Our Implementation
  • An edge triggered methodology
  • Typical execution
  • read contents of some state elements,
  • send values through some combinational logic
  • write results to one or more state elements

10
Register File
  • Built using D flip-flops

11
Register File
  • Note we still use the real clock to determine
    when to write

12
Simple Implementation
  • Include the functional units we need for each
    instruction

Why do we need this stuff?
13
Building the Datapath- Fetching Instruction
  • To execute an instruction, we start by fetching
    the instruction from the memory
  • To prepare for executing the next instruction, we
    must increment the program counter (PC) by 4

14
Building the Datapath- Fetching Instruction
  • To execute an instruction, we start by fetching
    the instruction from the memory
  • To prepare for executing the next instruction, we
    must increment the program counter (PC) by 4

15
Building the Datapath- R-type Instructions
  • Include instructions add, sub, slt, and, or
  • The register numbers come from fields of the
    instruction
  • Example add t0, s1, s2
  • 000000 10001 10010 01000 00000 100000 op
    rs rt rd shamt funct

16
Building the Datapath- R-type Instructions
  • Include instructions add, sub, slt, and, or
  • The register numbers come from fields of the
    instruction
  • Example add t0, s1, s2
  • 000000 10001 10010 01000 00000 100000 op
    rs rt rd shamt funct

17
Building the Datapath- load and store Instructions
  • The instructions compute a memory address by
    adding the base register (s2) to the 16-bit
    signed offset field (32) contained in the
    instruction
  • Example lw t0, 32(s2) 35 18 9
    32 op rs rt 16 bit number

18
Building the Datapath- load and store Instructions
  • The instructions compute a memory address by
    adding the base register (s2) to the 16-bit
    signed offset field (32) contained in the
    instruction
  • Example lw t0, 32(s2) 35 18 9
    32 op rs rt 16 bit number

19
Building the Datapath- branch Instructions
  • When the condition is true, the branch target
    address becomes the new PC
  • Otherwise, the incremented PC should replace the
    current PC
  • Hence, we need compute the branch target address,
    and compare the register contents
  • Instructions
  • bne t4,t5,Label Next instruction is at Label if
    t4?t5
  • beq t4,t5,Label Next instruction is at Label if
    t4t5

20
Building the Datapath- branch Instructions
  • When the condition is true, the branch target
    address becomes the new PC
  • Otherwise, the incremented PC should replace the
    current PC
  • Hence, we need compute the branch target address,
    and compare the register contents
  • Instructions
  • bne t4,t5,Label Next instruction is at Label if
    t4?t5
  • beq t4,t5,Label Next instruction is at Label if
    t4t5

21
Building the Datapath
  • Single cycle
  • Use multiplexors to stitch them together

22
Control
  • Selecting the operations to perform (ALU,
    read/write, etc.)
  • Controlling the flow of data (multiplexor inputs)
  • Information comes from the 32 bits of the
    instruction
  • Example add 8, 17, 18 Instruction
    Format 000000 10001 10010 01000
    00000 100000 op rs rt rd shamt
    funct
  • ALU's operation based on instruction type and
    function code

23
Control
  • e.g., what should the ALU do with this
    instruction
  • Example lw 1, 100(2) 35 2 1
    100 op rs rt 16 bit offset
  • ALU control input 0000 AND 0001 OR 0010 add
    0110 subtract 0111 set-on-less-than 1100 NOR

24
Control
  • Must describe hardware to compute 4-bit ALU
    control input
  • given instruction type 00 lw, sw 01 beq,
    10 arithmetic
  • function code for arithmetic
  • Describe it using a truth table (can turn into
    gates)

25
Control
  • For a load, destination register is bits 20-16
  • For R-type instruction, destination register is
    bits 15-11
  • We need a mux to select the field of instruction
    to indicate the register number to be written

26

27
Control
  • Simple combinational logic (truth tables)

ALU control
Control
28
Operation of the Datapath
  • R-type instruction add t1, t2, t3
  • An instruction is fetched from the instruction
    memory and the PC is incremented by 4
  • Two registers, t2 and t3, are read from the
    register file. The main control unit computes the
    setting of the control lines
  • The ALU operates on the data read from the
    register file, using the function code (bits 5-0)
    to generate the ALU function
  • The result from the ALU is written into the
    register file using bits 15-11 of the instruction
    to select the destination register t1

29

30
Operation of the Datapath
  • Load instruction lw t1, offset(t2)
  • An instruction is fetched from the instruction
    memory and the PC is incremented by 4
  • A register t2 value is read from the register
    file
  • The ALU computes the sum of the value read from
    the register file and the sign-extended, lower 16
    bits of the instruction (offset)
  • The sum from the ALU is used as the address for
    the data memory
  • The data from the memory unit is written into the
    register file

31

32
Operation of the Datapath
  • Branck-on-equal instruction beq t1,t2,offset
  • An instruction is fetched from the instruction
    memory and the PC is incremented by 4
  • Two registers, t1 and t2, are read from the
    register file
  • The ALU performs the subtraction on the data
    values read from the register file. The value of
    PC 4 is added to the sign-extended, lower 16
    bits of the instruction (offset) shifted left by
    two the result is the branch target address.
  • The Zero result from the ALU is used to decide
    which adder result to store into the PC

33

34
Adding New Instruction
  • Jump instruction j Label
  • fields 000010 address
  • bit position 3126 250
  • An instruction is fetched from the instruction
    memory and the PC is incremented by 4
  • The upper 4 bits of PC 4, the 26-bit immediate
    field (address), and 00 are concatenated.
  • The concatenation result is stored into the PC

35

36
Our Simple Control Structure
  • All of the logic is combinational
  • We wait for everything to settle down, and the
    right thing to be done
  • ALU might not produce right answer right away
  • We use write signals along with clock to
    determine when to write
  • Cycle time determined by length of the longest
    path

We are ignoring some details like setup and hold
times
37
Single Cycle Implementation
  • Calculate cycle time assuming negligible delays
    except
  • memory (200ps), ALU and adders (100ps),
    register file access (50ps)

38
Example
  • The clock cycle is determined by the longest
    possible path
  • Usually the instruction lw uses five functional
    units in series the instruction memory, the
    register file, the ALU, the data memory, and the
    register file
  • A significant portion of the clock cycle will be
    wasted for add, sub, etc.

Assuming the following instruction mix 25
loads, 10 stores, 45 R-format instructions, 15
branches, and 5 jumps, compare the performance
for two implementations Implementation 1 every
instruction operates in 1 clock cycle of a fixed
length. Implementation 2 every instruction
executes in 1 clock cycle using a variable-length
clock, which for each instruction is only as long
as it needs to be.
39
Example
CPU execution time Instruction count x CPI x
Clock cycle time Where CPI is 1 for both
implementations. CPU clock cycle for
implementation 2 600 x 25 550 x 10 400
x 45 350 x 15 200 x 5 447.5 ps (CPU
performance 2) / (CPU performance 1) (CPU
execution time 1) / (CPU execution time 2) (CPU
clock cycle 1) / (CPU clock cycle 2) 600/447.5
1.34
Instruction Class Instruction memory Register read ALU operation Data memory Register write Total
R-format 200 50 100 0 50 400 ps
Load word 200 50 100 200 50 600 ps
Store word 200 50 100 200 550 ps
Branch 200 50 100 0 350 ps
Jump 200 200 ps
40
Where we are headed
  • Single Cycle Problems
  • Performance
  • Clock cycle is equal to the worst-case delay for
    all instructions ? Violating making common fast
    principle
  • What if we had a more complicated instruction
    like floating point?
  • Hardware cost
  • Some functional units must be duplicated
  • Wasteful of area
  • One Solution
  • Use a smaller cycle time
  • Have different instructions take different
    numbers of cycles
  • Multicycle datapath
Write a Comment
User Comments (0)
About PowerShow.com