Title: Instruction Fetch Stage
1Instruction Fetch Stage
- Fetch instruction from Instruction Memory and
send it to the Decode stage.
2Instruction Fetch Stage
3Whats New ?
Assume sub a0,3,4 is stored
- Store the branch target
- Loop
-
- beq 5,5,Label
- add a0,3,4
- ....
- ....
- Label
- sub a0,3,4
- add a1,8,9
Instruction Fetch
Instruction Decode
beq 5,5,Label X
add a0,3,4 beq 5,5,Label
Sub a0,3,4 STALL
add a1,8,9 sub a0,3,4
4Whats New ?
Assume sub a0,3,4 is stored
- Store the branch target
- Loop
-
- beq 5,5,Label
- add a0,3,4
- ....
- ....
- Label
- sub a0,3,4
- add a1,8,9
Instruction Fetch
Instruction Decode
beq 5,5,Label X
add a0,3,4 sub a0,3,4 beq 5,5,Label
add a1,8,9 sub a0,3,4
BTB Cache (sub a0,3,4)?
5Stalls !
- Stalls are very common due to Branch or Data
hazards. - Branch Hazards
- 1 clock cycle penalty for the current MIPS
architecture to calculate target address. - In many applications looping or branching occurs
very frequently. - Ex
- for (i0ilt100i)?
-
- Label1
-
- Label2
-
6IF stage with BTB Architecture
7Why these many registers ?
Assume sub a0,3,4 is stored
- Store the branch target
- Loop
-
- beq 5,5,Label
- add a0,3,4
- ....
- ....
- Label
- sub a0,3,4
- add a1,8,9
Instruction Fetch
Instruction Decode
beq 5,5,Label X
add a0,3,4 sub a0,3,4 beq 5,5,Label
add a1,8,9 sub a0,3,4
BTB Cache (sub a0,3,4)?
8BTB Read
- Assume entries for branches are
- already in the BTB Cache
- Sequence should be
- Loop
-
- B1
- B3
- A1
- A2
- B2
- B4
9BTB A register level look !
10General Flow of registers any clock cycle
- The temp2 register is loaded with instruction and
taken PC part of the IF/ID2 in previous cycle. - The IF/ID1 register is loaded with instruction
corresponding to address in PC register in
previous cycle. - The instruction to be decoded will be selection
of IF/ID1 or temp2. If previous instruction is
branch and taken then temp2 is selected else for
non branch instruction or branch but not taken
instructions, IF/ID1 is selected. - IF/ID2 is written with either Out1 or Out2. Both
entries comprise of taken instruction, taken PC,
taken PC4. Out2 is selected if instruction
decoded in previous cycle is branch and taken,
else Out1 is selected. - PC is updated with either PC4 or taken PC4. If
the instruction decoded in previous cycle is
branch and is taken then taken PC4 is chosen
else if it is not a branch or branch but not
taken, then PC4 is chosen. - Out1 corresponds to BTB fetch-index being, the PC
register. - Out2 corresponds to BTB fetch-index being, the
taken PC of IF/ID2 register.
11tick1
A1
X
B1
X
B3
X
X
X
X
12tick2
A2
B1
A1
B1
X
B3
X
B4
A1
13Flow - Example
Control
14tick3
A4
A1
B4
B3
B1
A1
B3
A2
X
15Flow - Example
Control
16tick4
B2
B4
A2
A1
X
X
A1
X
X
17Flow - Example
Control
18tick5
A3
A2
B2
A2
B4
X
X
X
X
19Flow - Example
Control
20tick6
B3
B2
A3
B2
X
B4
X
A4
B1
21Flow - Example
Control
22tick7
X
A3
A4
B4
X
B1
B4
A1
B3
23Flow - Example
Control
24tick8
A2
A4
A1
B1
X
B3
B1
B4
A1
25Flow - Example
Control
26tick9
A4
A1
B4
B3
B1
A1
B3
A2
X
27Timing
28BTB Write
29Control Logic
30Control Logic
31Control Logic
IsBTBWrite is a delayed version of Write signal
32Memory Load Logic
33Simulation using ModelSim
34Results
35Placing and Routing
36Possible Improvements
- Using Flips flops with enable signal (pc_stall)
could reduce the control logic complexity as well
as negative edge registers are no more required
(whose purpose is to retain previous cycle
contents). - The BTB entry takenPC4 is redundant. A -4 adder
could suffice for the solution.
37IF stage with BTB Architecture
38References
- Reducing Misprediction Penalty in the Branch
Target Buffer Sherine AbdelHak, Abhijit Sil, Yi
Wang, Nian-Feng Tzeng, Magdy Bayoumi. Circuits
and Systems, 2007. MWSCAS 2007. 50th Midwest
Symposium Publication Date 5-8 Aug. 2007 - Hennessy J., Patterson D., Computer
Architecture, A Quantitative Approach, Morgan
Kaufmann, 2003.