Title: Tomasulos Algorithm
1Tomasulos Algorithm
2Tomasulo Organization
3Reservation Station Components
- OpOperation to perform in the unit (e.g., or
) - Qj, QkReservation stations producing source
registers - Vj, VkValue of Source operands
- Rj, RkFlags indicating when Vj, Vk are ready
- BusyIndicates reservation station and FU is
busy - Register result statusIndicates which
functional unit will write each register, if one
exists. Blank when no pending instructions that
will write that register.
4Three Stages of Tomasulo Algorithm
- 1. Issueget instruction from FP Op Queue
- If reservation station free, the scoreboard
issues instr sends operands (renames
registers). - 2. Executionoperate on operands (EX)
- When both operands ready then execute if not
ready, watch CDB for result - 3. Write resultfinish execution (WB)
- Write on Common Data Bus to all awaiting units
mark reservation station available.
5Tomasulo Example Cycle 0
6Tomasulo Example Cycle 1
Yes
7Tomasulo Example Cycle 2
8Tomasulo Example Cycle 3
9Tomasulo Example Cycle 4
10Tomasulo Example Cycle 5
11Tomasulo Example Cycle 6
12Tomasulo Example Cycle 7
13Tomasulo Example Cycle 8
14Tomasulo Example Cycle 9
15Tomasulo Example Cycle 10
6
16Tomasulo Example Cycle 11
17Tomasulo Example Cycle 12
18Tomasulo Example Cycle 13
19Tomasulo Example Cycle 14
20Tomasulo Example Cycle 15
21Tomasulo Example Cycle 16
22Tomasulo Example Cycle 17
23Tomasulo Example Cycle 18
24Tomasulo Example Cycle 57
25Tomasulo Example Cycle 58
26Tomasulo Example Cycle 59
27Tomasulo Loop Example
- Loop LD F0 0 R1
- MULTD F4 F0 F2
- SD F4 0 R1
- SUBI R1 R1 8
- BNEZ R1 Loop
- Multiply takes 4 clocks
- Load have cache misses
28Loop Example Cycle 0
29Loop Example Cycle 1
30Loop Example Cycle 2
31Loop Example Cycle 3
32Loop Example Cycle 4
33Loop Example Cycle 5
34Loop Example Cycle 6
35Loop Example Cycle 7
36Loop Example Cycle 8
37Loop Example Cycle 9
38Loop Example Cycle 10
39Loop Example Cycle 11
40Loop Example Cycle 12
41Loop Example Cycle 13
42Loop Example Cycle 14
43Loop Example Cycle 15
44Loop Example Cycle 16
45Loop Example Cycle 17
46Loop Example Cycle 18
47Loop Example Cycle 19
48Loop Example Cycle 20
49Loop Example Cycle 21
50Tomasulo Summary
- Prevents Register as bottleneck
- Avoids WAR, WAW hazards of Scoreboard
- Allows loop unrolling in HW
- Not limited to basic blocks (provided branch
prediction) - Lasting Contributions
- Dynamic scheduling
- Register renaming
- Load/store disambiguation
- Next More branch prediction