Computer Organization Multi-cycle Approach - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Organization Multi-cycle Approach

Description:

Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah s http://www.abandah.com/gheith/Courses/CPE335_S08/index.html – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 25
Provided by: Dr23455
Category:

less

Transcript and Presenter's Notes

Title: Computer Organization Multi-cycle Approach


1
Computer Organization Multi-cycle Approach
  • Dr. Iyad Jafar
  • Adapted from Dr. Gheith Abandah slides
  • http//www.abandah.com/gheith/Courses/CPE335_S08/i
    ndex.html

2
Multicycle Datapath Approach
  • Let an instruction take more than 1 clock cycle
    to complete
  • Break up instructions into steps where
  • each step takes a cycle while trying to balance
    the amount of work to be done in each step
  • restrict each cycle to use only one major
    functional unit unless used in parallel
  • Not every instruction takes the same number of
    clock cycles
  • In addition to faster clock rates, multicycle
    allows functional units that can be used more
    than once per instruction as long as they are
    used on different clock cycles, as a result
  • Need one memory only but only one memory access
    per cycle
  • Need one ALU/adder only but only one ALU
    operation per cycle

3
Multicycle Datapath Approach, cont
  • At the end of a cycle
  • Store values needed in a later cycle by the
    current instruction in internal registers (A,B,
    IR, and MDR) . These registers are invisible to
    the programmer.
  • All of these registers, except IR, hold data only
    between a pair of adjacent clock cycles thus they
    dont need write control signal.
  • IR Instruction Register MDR Memory Data
    Register
  • A, B regfile read data registers ALUout ALU
    output register
  • Data used by subsequent instructions are stored
    in programmer visible registers (i.e., register
    file, PC, or memory)

4
Multicycle Datapath Approach, cont
  • Similar to single cycle, shared functional
    units should have multiplexers at their inputs.
  • There is only one adder that will be used to
    update PC, perform ALU operations, comparison for
    beq, memory address computation, and branch
    address computation.

5
Multicycle Datapath Approach- Control Signals
6
The Multicycle Datapath with Control Signals
PCWriteCond
PCWrite
PCSource
ALUOp
IorD
Control
MemRead
ALUSrcB
MemWrite
ALUSrcA
MemtoReg
RegWrite
IRWrite
RegDst
PC31-28
Instr31-26
Shift left 2
28
Instr25-0
2
0
1
Address
Memory
0
PC
Read Addr 1
0
A
Read Data 1
IR
Register File
1
zero
1
Read Addr 2
Read Data (Instr. or Data)
0
ALUout
ALU
Write Addr
Read Data 2
Write Data
1
B
0
Write Data
1
4
1
0
2
Sign Extend
Shift left 2
3
Instr15-0
ALU control
32
Instr5-0
7
Multicycle Machine 1-bit Control Signals
Signal Effect when deasserted Effect when asserted
RegDst The destination register number comes from the rt field The destination register number comes from the rd field
RegWrite None Write is enabled to selected destination register
ALUSrcA The first ALU operand is the PC The first ALU operand is register A
MemRead None Content of memory address is placed on Memory data out
MemWrtite None Memory location specified by the address is replaced by the value on Write data input
MemtoReg The value fed to register file is from ALUOut The value fed to register file is from memory
IorD PC is used as an address to memory unit ALUOut is used to supply the address to the memory unit
IRWrite None The output of memory is written into IR
PCWrite None PC is written the source is controlled by PCSource
PCWriteCond None PC is written if Zero output from ALU is also active
8
Multicycle Machine 2-bit Control Signals
Signal Value Effect
ALUOp 00 ALU performs add operation
ALUOp 01 ALU performs subtract operation
ALUOp 10 The funct field of the instruction determines the ALU operation
ALUSrcB 00 The second input to the ALU comes from register B
ALUSrcB 01 The second input to the ALU is 4 (to increment PC)
ALUSrcB 10 The second input to the ALU is the sign extended offset , lower 16 bits of IR.
ALUSrcB 11 The second input to the ALU is the sign extended , lower 16 bits of the IR shifted left by two bits
PCSource 00 Output of ALU (PC 4) is sent to the PC for writing
PCSource 01 The content of ALUOut are sent to the PC for writing (Branch address)
PCSource 10 The jump address is sent to the PC for writing
9
Breaking Instruction Execution into Clock Cycles
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Dec
  • 1. IFetch Instruction Fetch and Update PC (Same
    for all instructions)
  • Operations
  • 1.1 Instruction Fetch IR lt MemoryPC
  • 1.2 Update PC PC lt PC 4
  • Control signals values
  • IorD 0 , MemRead 1 , IRWrite 1
  • ALUSrcA 0, ALUSrcB 01, ALUOp 00, PCWrite
    1
  • PCSrc 00

10
Breaking Instruction Execution into Clock Cycles
  • 2. Decode - Instruction decode and register fetch
    (same for all instructions)
  • We dont know the instruction yet, do non
    harmful operations
  • Operations
  • 2.1 read the two source registers rs and rt and
    place them in registers A and B,
    respectively.
  • A lt RegIR2521
  • B lt RegIR2016
  • 2.2 Compute the branch address
  • ALUOut lt PC (sign-extend(IR150) ltlt2)
  • Control signals values
  • ALUSrcA 0, ALUSrcB 11, ALUOp 00

11
Breaking Instruction Execution into Clock Cycles
  • 3. Execution, Memory address computation, or
    branch completion
  • Operation in this cycle depends on instruction
    type
  • Operations
  • if memory reference, compute address
  • ALUOut lt A sign-extend(IR150)
  • ALUSrcA 1, ALUSrcB 10, ALUOp 00
  • if arithmetic-logic instruction, perform
    operation
  • ALUOut lt A op B
  • ALUSrcA 1, ALUSrcB 00, ALUOp 10

12
Breaking Instruction Execution into Clock Cycles
  • 3. Execution, Memory address computation, or
    branch completion (continued)
  • operation depends on instruction type
  • Operations
  • if branch instruction
  • if (A B) PClt ALUOut
  • ALUSrcA 1, ALUSrcB 00, ALUOp 01,
    PCWriteCond 1, PCSrc 01
  • if jump instruction
  • PC lt PC3128, (IR250,2b00)
  • PCSource 10, PCWrite 1

13
Breaking Instruction Execution into Clock Cycles
  • 4. Memory access or R-type completion
  • operation in this cycle depends on instruction
    type
  • Operations
  • if load instruction read value from memory
    into MDR
  • MDR lt MemoryALUOut
  • MemRead 1, IorD 1
  • if store instruction store rt into memory
  • MemoryALUOut lt B
  • MemWrite 1, IorD 1
  • if arithmetic-logical instruction write ALU
    result into rd
  • RegIR1511 lt ALUOut
  • MemtoReg 0, RegDst 1, RegWrite 1

14
Breaking Instruction Execution into Clock Cycles
  • 5. Memory read completion
  • Needed for the load instruction only
  • Operations
  • 5.1 store the loaded value in MDR into rt
  • RegIR2016 lt MDR
  • RegWrite 1, MemtoReg 1, RegDst 0

15
Breaking Instruction Execution into Clock Cycles
  • In this implementation, not all instructions take
    5 cycles

Instruction Class Clock Cycles Required
Load 5
Store 4
Branch 3
Arithmetic-logical 4
Jump 3
16
Multicycle Performance
  • Compute the average CPI for multicycle
    implementation for SPECINT2000 program which has
    the following instruction mix 25 loads, 10
    stores, 11 branches, 2 jumps, 52 ALU. Assume
    the CPI for each instruction class as given in
    the previous table
  • CPI S CPIi x ICi / IC
  • 0.25 x 5 0.1 x 4 0.11 x 3 0.02 x 3
    0.52 x 4
  • 4.12
  • Compare to CPI 1 for single cycle ?!!
  • Assume CCM 1/5 CCS
  • Then
  • PerformanceM / PerformanceS (IC x 1 x CCS ) /
    (IC x 4.12 x (1/5) CCS)
  • 1.21
  • Multicycle is also cost-effective in terms of
    hardware.

17
Multicycle Control Unit
  • Multicycle datapath control signals are not
    determined solely by the bits in the instruction
  • e.g., op code bits tell what operation the ALU
    should be doing, but not what instruction cycle
    is to be done next
  • Since the instruction is broken into multiple
    cycles, we need to know what we did in the
    previous cycle(s) in order to determine the
    current action
  • Must use a finite state machine (FSM) for control
  • a set of states (current state stored in State
    Register)
  • next state function (determined
    by
    current state and the input)
  • output function (determined by
    current state
    and the input)

18
The States of the Control Unit
  • 10 states are required in the FSM control
  • The sequence of states is determined by five
    steps of execution and the instruction

19
The Control Unit
  • Logic gates
  • inputs present state opcode ? bits 10
  • outputs control next state ? bits 20
  • truth table size 210 rows x 20 columns
  • ROM
  • Can be used to implement the truth table above
    (210 x 20 bit 20 Kbit)
  • Each location stores the control signals values
    and the next state
  • Each location is addressable by the opcode and
    next state value

20
Micro-programmed Control Unit
  • ROM implementation is vulnerable to bugs and
    expensive especially for complex CPU. Size
    increase as the number and complexity of
    instructions (states) increases.
  • Use Microprogramming
  • The next state value may not be sequential
  • Generate the next state outside the storage
    element
  • Each state is a microinstruction and the signals
    are specified symbolically
  • Use labels for sequencing

21
Sequencer
22
Microprogram
  • The microassembler converts the microcode into
    actual signal values
  • The sequencing field is used along with the
    opcode to determine the next state

23
Multicycle Advantages Disadvantages
  • Uses the clock cycle efficiently the clock
    cycle is timed to accommodate the slowest
    instruction step
  • Multicycle implementations allow functional units
    to be used more than once per instruction as long
    as they are used on different clock cycles
  • but
  • Requires additional internal state registers,
    more muxes, and more complicated (FSM) control

24
Single Cycle vs. Multiple Cycle Timing
Write a Comment
User Comments (0)
About PowerShow.com