Title: RISC Pipelining
1RISC Pipelining CS 147 Spring 2011 Kui Cheung
2RISC Pipelining
- Classic five stage instruction
- Fetch fetch instruction from memory
- Decode determine what action is required
- Execute execute instruction
- Memory data cache access
- Writeback write result to register
3Arm9
If we use the basketball team analogy, we can
assign the following positions to the different
stages.
1)Coach give a play to the point guard. 2)Point
guard pass the ball to the right person to
execute the play. 3)SF or PF continue setting up
the play by doing some fancy moves and then
pass the ball to the center. 4)Center continue
setup and pass the ball to SG for a clean
shot. 5)SG takes the shot.
Power Forward
Shooting Guard
Coach
Point Guard
Small Forward
Center
Nintendo DS 5 Stage Pipeline
4Arm9
1)Fetch instruction from instruction register(IR)
4)Access cache if needed
2)Determine what action to take
3)Execute the instruction
5)Write result in register
Example MOV Reg1, Mem1
1)fetch instruction(MOV Reg1, Mem1) 2)decided it
is a move instruction from memory to
register 3)fetch address of memory to be
move 4)fetch data from memory 5)write data to Reg1
Nintendo DS 5 Stage Pipeline
5RISC Pipelining
Instruction 1 2 3 4 5 6 7 8 9
1 FI DI EX MEM WB
2 FI DI EX MEM WB
3 FI DI EX MEM WB
4 FI DI EX MEM WB
5 FI DI EX MEM WB
- FI - fetch instruction
- DI - decode instruction
- EX - execute instruction
- MEM data cache access
- WB - write back
6Pipeline Delay
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1, Mem1 MOV Reg1, Reg2 MOV Mem2, Reg1
(a) No data load delay in the pipeline
- 1) move data from Mem1 to Reg1
- 2) move data from Reg2 to Reg1
- 3) move data from Reg1 to Mem2
7Pipeline Delay
Write data from Mem1 into Reg1
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1,Mem1 MOV Reg2,(Reg1)
Must wait for data to be loaded into Reg1
(b)Data dependency delay
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1,Mem1 MOV Reg2,(Reg1)
Stall(bubble)
1) move data from Mem1 to Reg1 2) move data from
Reg1 to Reg2
8Pipeline Delay
Add a NOP(no operation perform) to fill the gap
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
MOV Reg1,Mem1 NOP MOV Reg2,(Reg1)
1) move data from Mem1 to Reg1 2) no operation
perform 3) move data from Reg1 to Reg2
9(c)Control dependency delay
At this point Reg3 equal Reg2 Reg1, and line
103 can compare Reg3 to Reg4 and decide jumping
to 106 or not
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX
101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3
,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1,
Reg2 106 MOV Mem1, Reg4
Data dependency delay
jump
Reg3 Reg4, jump to 106
Waiting for 103 to decide going to 104 or jumping
to 106
101 add Reg2 to Reg1 and put in Reg3 102 no
operation perform 103 if Reg3 Reg4, jump to 106
else 104
104 move Reg3 to Mem1 105 add Reg2 to Reg1 and
put in Reg4 106 move Reg4 to Mem1
10(c)Control dependency delay
At this point Reg3 equal Reg2 Reg1, and line
103 can compare Reg3 to Reg4 and decide jumping
to 106 or not
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3
,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1,
Reg2 106 MOV Mem1, Reg4
Data dependency delay
Reg3 Reg4, jump to 106, no time wasted
Guess branch will happen
101 add Reg2 to Reg1 and put in Reg3 102 no
operation perform 103 if Reg3 Reg4, jump to 106
else 104
104 move Reg3 to Mem1 105 add Reg2 to Reg1 and
put in Reg4 106 move Reg4 to Mem1
11(c)Control dependency delay
At this point Reg3 equal Reg2 Reg1, and line
103 can compare Reg3 to Reg4 and decide jumping
to 106 or not
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX MEM WB
FI DI EX
FI DI
FI DI FI
FI
101 ADD Reg3, Reg2, Reg1 102 NOP 103 BEQ Reg3
,Reg4, 106 104 MOV Mem1, Reg3 105 ADD Reg4, Reg1,
Reg2 106 MOV Mem1, Reg4 107 MOV Reg2, Mem2
Data dependency delay
Reg3 not Reg4, clear and fetch 104 next
Guess wrong can lead to wasted time
12Pure RISC Pipeline
- Simple primitive instructions and addressing
modes - Instructions execute in one clock cycle
- Uniformed length instructions and fixed
instruction format - Instructions interface with memory via fixed
mechanisms (load/store) - Pipelining
- Instruction set is orthogonal (little overlapping
of instruction functionality) - Hardwired control
- Complexity pushed to the compiler
13Pure RISC Pipeline
- Register to register cycle
- 1) F instruction fetch from register
- 2) E execute , perform ALU operations
- with register input and output
- Load and Store cycle
- 1) F instruction fetch from register
- 2) E execute, calculates memory address
- 3) W memory, register to memory, memory to
- register operations
14Pure RISC Pipeline
a) Traditional pipeline
Instruction 1 2 3 4 5 6 7
1 F E W
2 F E
3 F E
4 F
5 F E W
100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP
105 103 ADD Reg1, Reg2 105 MOVE Mem2, Reg1
100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump
to 105 103 add Reg1 to Reg2 105 move Reg1 to Mem2
Jump execute and 103 is cleared from the
pipeline, 105 is fetch
F fetch E execute W write back
15Pure RISC Pipeline
a) RISC Pipeline with inserted NOP
Instruction 1 2 3 4 5 6 7
1 F E W
2 F E
3 F E
4 F E
5 F E W
100 MOVE Reg1, Mem1 101 ADD 1, Reg1 102 JUMP
105 103 NOP 105 MOVE Mem2, Reg1
100 move Mem1 to Reg1 101 add 1 to Reg1 102 Jump
to 105 103 no operation 105 move Reg1 to Mem2
A NOP is added so no special circuitry is needed
to clear the pipeline
F fetch E execute W write back
16Pure RISC Pipeline
a) Reversed instructions
Instruction 1 2 3 4 5 6 7
1 F E W
2 F E
3 F E
4 F E W
100 MOVE Reg1, Mem1 101 JUMP 105 102 ADD 1,
Reg1 105 MOVE Mem2, Reg1
Delayed branch When a branch occur, delay the
execution and fetch the next instruction
first. ex) fetch 102 before executing JUMP to
105, this way 102 can execute at the same
time 105 is fetch
100 move Mem1 to Reg1 101 Jump to 105 102 add
Reg1 to Reg2 105 move Reg1 to Mem2
F fetch E execute W write back
17Superpipeline
A B C D E F G H I J K L
A B C D E F G H I J K
A B C D E F G H I J
A B C D E F G H I
A B C D E F G H
A B C D E F G
A B C D E F
A B C D E
A B C D
A B C
A B
A
A B C D E F G H
Branch executed and pipeline is clear
In theory, more and shorter stages could allow
more instructions to be process at the same
time. But a branch could lead to wasted cycles.
18Arm11 Pipeline
Fetch Instruction
Decode
Execute
Memory
Writeback
Arm11(IPhone 3G) 8 Stage pipeline
19RISC Pipelining
Dynamic Branch Prediction 95 accuracy
Decode(5 stages)
Fetch Instruction(2 stages)
Execute, Memory, Writeback(6 stages)
Arm Cortex A8(IPhone3GS, Samsung Galaxy S) 13
Stage pipeline
20I7(Nehalem)Superpipeline
Fetch
Decode
14 Stages
Execute
Memory, Writeback
21Reference
- http//www.jp.arm.com/event/pdf/forum2008/t1-1.pdf
- http//www-cs-faculty.stanford.edu/eroberts/cours
es/soco/projects/2000-01/risc/pipelining/index.htm
l - http//www.bit-tech.net/hardware/cpus/2008/11/03/i
ntel-core-i7-nehalem-architecture-dive/5 - http//qu.academia.edu/AwsYousif/Papers/120709/A_N
ew_Trend_for_CISC_and_RISC_Architectures - Course text book Computer Organization and
Architecture, 7th editions, William Stallings