Instruction Issue Logic for HighPerformance Interruptible Pipelined Processors

1 / 18
About This Presentation
Title:

Instruction Issue Logic for HighPerformance Interruptible Pipelined Processors

Description:

Instruction Issue Logic for High-Performance Interruptible Pipelined ... Register tag consists of the register number appended with the LI counter. Merging ... –

Number of Views:95
Avg rating:3.0/5.0
Slides: 19
Provided by: prad163
Category:

less

Transcript and Presenter's Notes

Title: Instruction Issue Logic for HighPerformance Interruptible Pipelined Processors


1
Instruction Issue Logic for High-Performance
Interruptible Pipelined Processors
  • Gurinder S. Sohi
  • ProfessorUW-Madison Computer Architecture
    GroupUniversity of Wisconsin-Madison

Sriram Vajapeyam Real-Time Collaboration spaceat
Oracle, Bangalore, India
2
What is this about?
  • The performance of pipelined processors is
    severely limited by data dependencies and branch
    instructions.
  • Another major problem that arises in pipelined
    computer design is that an interrupt can be
    imprecise.
  • Both of these causes performance degradation.
  • A hardware solution is offered in this paper.

3
Problems and previous solutions
  • Data Dependency
  • Code scheduling
  • Waiting or Reservation stations
  • Branch Instructions
  • Delayed branching
  • Branch Prediction
  • Imprecise Interrupts
  • Reorder buffer
  • Reorder buffer with bypass logic

4
Basic Architecture
  • Same instruction set as the scalar unit of the
    CRAY-I
  • Several functional units connected to a common
    result bus
  • Instruction Fetch Unit
  • Decode and Issue Unit
  • 144 registers

5
Tomasulos Algorithm
  • First presented for the floating-point unit of
    the IBM 360/91.
  • Extension of this algorithm for the scalar unit
    of the CRAY-I is presented later.
  • Algorithm
  • Instruction whose operands are not available is
    forwarded to a Reservation stations (RS).
  • It waits in the RS until its operands are
    available.
  • it is dispatched to the appropriate functional
    unit
  • register is assigned a bit that determines if the
    register is busy (it is the destination of an
    instruction).
  • Busy register is assigned a tag which represents
    the result to be stored in the register.

6
Tomasulos Algorithm (Contd...)
Fields in Reservation Station
  • Disadvantage
  • High cost of hardware for register tagging and
    its associative comparison hardware.

7
Extension to Tomasulos Algorithm
  • A Separate Tag Unit
  • Because only few sink registers (busy registers)
    are active.
  • All tags from active registers are consolidated
    into Tag Unit
  • Register retains the busy bit
  • Algorithm
  • At instruction issue time, if a source register
    is busy, the TU is queried for the current tag of
    the appropriate register and the tag is forwarded
    to the reservation stations.
  • If the destination register not busy obtaining
    tag is straightforward.
  • If it is busy a new tag is obtained.
  • Latest Field is used to keep the register busy
    even after the old instruction is executed.
  • If the TU is full instruction issue is stopped.

8
Extension to Tomasulos Algorithm (contd)
Fields in Reservation Station
9
Other Extensions
  • Merging Reservation Stations into RS pool
    (Disadvantage only one instruction can be issued
    at a time! NO)
  • Merging RS pool with Tag Unit. To make RS Tag
    Unit (RSTU)

Fields in RSTU
10
Implementation of Precise interrupts
  • Reorder Buffer It allows instructions to finish
    execution out of order but updates registers,
    memory, etc. in the order that the instructions
    were present in the program. So it assures that a
    precise state of the machine is recoverable at
    any time.
  • Bypass Logic An instruction does not have to
    wait for the reorder buffer to update a source
    register, it can fetch the value from the reorder
    buffer (if it is available) and can issue.

11
MERGING DEPENDENCY RESOLUTION AND
PRECISEINTERRUPTS
  • RSTU can be made to behave like a reorder buffer
    if it is forced to update the state of the
    machine in the order that the instructions are
    encountered by making it a queue.
  • Modified unit is called Register Update Unit
    (RUU). It
  • (i) determines which instruction should be issued
    to the functional units for execution, reserves
    the result bus and dispatches the instruction to
    the functional unit,
  • (ii) determines which instruction can commit,
    i.e., update the state of the machine,
  • (iii) monitors the result bus to resolve
    dependencies and
  • (iv) provides tags to and accepts new
    instructions from the decode and issue unit.

12
Fields in RUU
13
Merging (Contd)
  • Destination Field
  • In the RSTU the issue logic needed to search the
    TU to obtain the correct tag for the source
    operand and to update the latest copy field for
    the destination
  • Here we use a counter to instead of multiple
    copies of a destination
  • 2 n-bit counters - Number of Instances (NI) and
    Latest instance (LI)
  • When an instruction that writes into destination
    is issued to the RUU, both NI and LI are
    incremented. LI incremented modulo n.
  • When such instruction leaves the associated NI is
    decremented.
  • Register tag consists of the register number
    appended with the LI counter.

14
Merging (Contd)
  • Bypass Logic in the RUU
  • case that bypass logic might be helpful is when
    Ij has completed execution but has not committed
    when Ii is issued to the RUU (Ii is issued after
    Ij)
  • To provide bypass logic for this case, the
    monitoring capabilities of the reservation
    stations are extended to monitor both the result
    bus and the RUU to register bus.

15
SIMULATION
  • Simulation Results
  • The benchmark programs used were the Lawrence
    Livermore loops
  • Large sized RUU is needed to achieve a
    performance improvement.
  • RUU of size 10 has same hardware requirements as
    an architecture that has reservation station with
    each of the functional unit.

16
(No Transcript)
17
BRANCH PREDICTION AND CONDITIONAL INSTRUCTIONS
  • To allow conditional execution of instructions, a
    hardware mechanism is needed that would allow the
    machine to recover from an incorrect branch
    prediction.
  • RUU provides a method for nullifying
    instructions, as for the interrupts.

18
Conclusions
  • combined the issues of hardware
    dependency-resolution and implementation of
    precise interrupts.
  • A scheme to resolve dependencies and allowing the
    out-order-execution is devised with low hardware
    cost.
  • It is incorporated with precise interrupts.
  • This incorporation made each issue simpler than
    before.
  • Results of performance evaluation are quite
    encouraging.
  • This mechanism can be easily extended to support
    conditional execution of instructions from a
    predicted path.
Write a Comment
User Comments (0)
About PowerShow.com