Title: Instruction Issue Logic for HighPerformance Interruptible Pipelined Processors
1Instruction Issue Logic for High-Performance
Interruptible Pipelined Processors
- Gurinder S. Sohi
- ProfessorUW-Madison Computer Architecture
GroupUniversity of Wisconsin-Madison
Sriram Vajapeyam Real-Time Collaboration spaceat
Oracle, Bangalore, India
2What is this about?
- The performance of pipelined processors is
severely limited by data dependencies and branch
instructions. - Another major problem that arises in pipelined
computer design is that an interrupt can be
imprecise. - Both of these causes performance degradation.
- A hardware solution is offered in this paper.
3Problems and previous solutions
- Data Dependency
- Code scheduling
- Waiting or Reservation stations
- Branch Instructions
- Delayed branching
- Branch Prediction
- Imprecise Interrupts
- Reorder buffer
- Reorder buffer with bypass logic
4Basic Architecture
- Same instruction set as the scalar unit of the
CRAY-I - Several functional units connected to a common
result bus - Instruction Fetch Unit
- Decode and Issue Unit
- 144 registers
5Tomasulos Algorithm
- First presented for the floating-point unit of
the IBM 360/91. - Extension of this algorithm for the scalar unit
of the CRAY-I is presented later. - Algorithm
- Instruction whose operands are not available is
forwarded to a Reservation stations (RS). - It waits in the RS until its operands are
available. - it is dispatched to the appropriate functional
unit - register is assigned a bit that determines if the
register is busy (it is the destination of an
instruction). - Busy register is assigned a tag which represents
the result to be stored in the register.
6Tomasulos Algorithm (Contd...)
Fields in Reservation Station
- Disadvantage
- High cost of hardware for register tagging and
its associative comparison hardware.
7Extension to Tomasulos Algorithm
- A Separate Tag Unit
- Because only few sink registers (busy registers)
are active. - All tags from active registers are consolidated
into Tag Unit - Register retains the busy bit
- Algorithm
- At instruction issue time, if a source register
is busy, the TU is queried for the current tag of
the appropriate register and the tag is forwarded
to the reservation stations. - If the destination register not busy obtaining
tag is straightforward. - If it is busy a new tag is obtained.
- Latest Field is used to keep the register busy
even after the old instruction is executed. - If the TU is full instruction issue is stopped.
8Extension to Tomasulos Algorithm (contd)
Fields in Reservation Station
9Other Extensions
- Merging Reservation Stations into RS pool
(Disadvantage only one instruction can be issued
at a time! NO) - Merging RS pool with Tag Unit. To make RS Tag
Unit (RSTU)
Fields in RSTU
10Implementation of Precise interrupts
- Reorder Buffer It allows instructions to finish
execution out of order but updates registers,
memory, etc. in the order that the instructions
were present in the program. So it assures that a
precise state of the machine is recoverable at
any time. - Bypass Logic An instruction does not have to
wait for the reorder buffer to update a source
register, it can fetch the value from the reorder
buffer (if it is available) and can issue.
11MERGING DEPENDENCY RESOLUTION AND
PRECISEINTERRUPTS
- RSTU can be made to behave like a reorder buffer
if it is forced to update the state of the
machine in the order that the instructions are
encountered by making it a queue. - Modified unit is called Register Update Unit
(RUU). It - (i) determines which instruction should be issued
to the functional units for execution, reserves
the result bus and dispatches the instruction to
the functional unit, - (ii) determines which instruction can commit,
i.e., update the state of the machine, - (iii) monitors the result bus to resolve
dependencies and - (iv) provides tags to and accepts new
instructions from the decode and issue unit.
12Fields in RUU
13Merging (Contd)
- Destination Field
- In the RSTU the issue logic needed to search the
TU to obtain the correct tag for the source
operand and to update the latest copy field for
the destination - Here we use a counter to instead of multiple
copies of a destination - 2 n-bit counters - Number of Instances (NI) and
Latest instance (LI) - When an instruction that writes into destination
is issued to the RUU, both NI and LI are
incremented. LI incremented modulo n. - When such instruction leaves the associated NI is
decremented. - Register tag consists of the register number
appended with the LI counter.
14Merging (Contd)
- Bypass Logic in the RUU
- case that bypass logic might be helpful is when
Ij has completed execution but has not committed
when Ii is issued to the RUU (Ii is issued after
Ij) - To provide bypass logic for this case, the
monitoring capabilities of the reservation
stations are extended to monitor both the result
bus and the RUU to register bus.
15SIMULATION
- Simulation Results
- The benchmark programs used were the Lawrence
Livermore loops - Large sized RUU is needed to achieve a
performance improvement. - RUU of size 10 has same hardware requirements as
an architecture that has reservation station with
each of the functional unit.
16(No Transcript)
17BRANCH PREDICTION AND CONDITIONAL INSTRUCTIONS
- To allow conditional execution of instructions, a
hardware mechanism is needed that would allow the
machine to recover from an incorrect branch
prediction. - RUU provides a method for nullifying
instructions, as for the interrupts.
18Conclusions
- combined the issues of hardware
dependency-resolution and implementation of
precise interrupts. - A scheme to resolve dependencies and allowing the
out-order-execution is devised with low hardware
cost. - It is incorporated with precise interrupts.
- This incorporation made each issue simpler than
before. - Results of performance evaluation are quite
encouraging. - This mechanism can be easily extended to support
conditional execution of instructions from a
predicted path.