Title: Implementing an ISA, part II - Control
1Implementing an ISA, part II - Control
- David E. Culler
- CS61CL
- Nov 4, 2009
- Lecture 10
2Review TinyMIPS
- Reg-Reg instructions (op 0)
- addu Rrd Rrs Rrt pcpc4
- subu Rrd Rrs - Rrt pcpc4
- Reg-Immed (op ! 0)
- lw Rrt Mem R rs signEx(Im16)
- sw Mem R rs signEx(Im16) Rrt
- Jumps
- j PC PC31..28 addr 00
- jr PC Rrs
- Branches
- BEQ PC (Rrs Rrt) ? PC signEx(im16)
PC4 - BLTZ PC (Rrs lt 0) ? PC signEx(im16)
PC4
3Review DataPath Control
RAM
D
A
IR
PC
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
ld_pc
comp
pc2A
m2D
ld_ir
b2D
s2A
s2D
i2D
wrt
4Control State Machine (abstract)
AddU RrdRrsRrt pc pc4
SubU Rrd Rrs-Rrt pc pc4
resetOPaddu
LW Rrt memRrssx16 pc pc4
resetOPlw
I-Fetch IR Mempc
SW memRrssx16 Rrt pc pc4
J pc pc31..28addr00
reset
JR pc Rrs
reset ( (OPbeq) EQ)) (OPBneg)
N)) )
BR-taken pc pc sx1600
BR-not taken pc pc 4
5Ifetch IR mempc
RAM
D
A
IR
PC
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
comp
ld_pc
pc2A
m2D
ld_ir
s2D
b2D
i2D
s2A
wrt
- RAM_addr lt- A lt- PC (pc2A, s2A)
- IR_in lt- D lt- RAM_data (i2D,m2D,b2D,s2D)
- IR IR_in (ld_ir,ld_pc,ld_reg, wrt)
6Control State Machine
AddU RrdRrsRrt pc pc4
SubU Rrd Rrs-Rrt pc pc4
resetOPaddu
LW Rrt memRrssx16 pc pc4
resetOPlw
I-Fetch IR Mempc pc2A,s2A,ir2D,m2D,b2D,s
2D,ld_ir,ld_pc, ld_reg, wrt
SW memRrssx16 Rrt pc pc4
J pc pc31..28addr00
JR pc Rrs
reset ( (OPbeq) EQ)) (OPBneg)
N)) )
BR-taken pc pc sx1600
reset
BR-not taken pc pc 4
7Exec RrdRrsRrt pcpc4
RAM
D
A
IR
PC
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
comp
ld_pc
pc2A
m2D
ld_ir
s2D
b2D
i2D
s2A
wrt
- npc_sel0,ld_pc,pc2A,ld_ir,i2D,wrt,m2D,rt_se
l,ld_reg,b2D,sx_sel,comp,s2A,s2D
8Control State Machine
AddU RrdRrsRrt pc
pc4 npc_sel0,ld_pc,pc2A,ld_ir,i2D,wrt,m2D,
rt_sel,ld_reg,b2D,sx_sel,comp,s2A,s2D
SubU Rrd Rrs-Rrt pc pc4
resetOPaddu
LW Rrt memRrssx16 pc pc4
resetOPlw
I-Fetch IR Mempc pc2A,s2A,ir2D,m2D,b2D,s
2D,ld_ir,ld_pc, ld_reg, wrt
SW memRrssx16 Rrt pc pc4
J pc pc31..28addr00
JR pc Rrs
reset ( (OPbeq) EQ)) (OPBneg)
N)) )
BR-taken pc pc sx1600
reset
BR-not taken pc pc 4
9Exec RrdRrs-Rrt pcpc4
- npc_sel0,ld_pc,pc2A,ld_ir,i2D,wrt,m2D,rt_se
l,ld_reg,b2D,sx_sel,comp,s2A,s2D
10Exec LW Rrt MemRrsSXim16
- npc_sel0, ld_pc, m2D, rt_sel, ld_reg, sx_sel, s2A
11Exec SW MemRrsSXim16 Rrt
12Exec J PC PC31..28 addr 00
RAM
D
A
IR
PC
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
comp
ld_pc
pc2A
m2D
ld_ir
s2D
b2D
i2D
s2A
wrt
13Exec JR PC Rrs
RAM
D
A
IR
PC
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
comp
ld_pc
pc2A
m2D
ld_ir
s2D
b2D
i2D
s2A
wrt
- npc_sel2, ld_pc, s2D, sx_sel2
14Exec Br Taken PC PC SX16
RAM
D
A
IR
PC
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
comp
ld_pc
pc2A
m2D
ld_ir
s2D
b2D
i2D
s2A
wrt
15Controller Specification
16Adminstration
- HW7 due midnight
- Mid Term 2 Monday 11/9
- 530 730 RM 145 Dwinelle
- alternate Friday 11/4 300-500 rm 310 Soda
- Review session Sunday 5-7 306 Soda
- Project 3
- incremental lab check offs
- Flex lab mon (9-1) and tues (9-5)
- midterm final prep
- project 3 help
17Controller Implementation
exec
npc_sel
clk
reset
op
eq
n
s2D
18Combinational Logic per Ctrl Point
exec
pc2A
exec
ldPC
exec
wrt
op
19Multiplexor Control
I 0 1 0 0 0 0 0 0 1 0 0 0 0 1
I 0 1 0 0 0 0 0 0 1 0 0 1 0 1
I 3 1 0 0 1 0 0 x 0 0 x x 0 0
op
20Faster Clock
RAM
D
Addr
A
S
IR
PC
B
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
comp
ld_pc
pc2A
m2D
ld_ir
s2D
b2D
i2D
s2A
wrt
- Clock Period gt Longest path from reg out to input
reg delay
21Multi-Cycle Controller State Machine
AddU SAB pc pc4
RrdS
SubU S A B pc pc4
RrdS
LW SAsx16 pc pc4
read MARS
RrdD
Op / Dcd A Rrs, B Rrt
SW SAsx16 pc pc4
I-Fetch IR Mempc
wrt MARS MDRB
J pc pc31..28addr00
JR pc Rrs
BR-taken pc pc sx1600
BR-not taken pc pc 4
22Time State control
mem
mem
IR
IR_ex
IR_mem
IR_wb
PC
- Move control word through the stages
- Decode per stage
- Active stage moves around the ring
23Time State Control
RAM
D
Addr
A
S
IR
PC
B
Asel
Bsel
Dsel
ld
npc_sel
ld_reg
sx_sel
rt_sel
comp
ld_pc
pc2A
m2D
ld_ir
s2D
b2D
i2D
s2A
wrt
ifetch
decode
exec
wb
mem
24More regular multi-cycle execution
AddU SAB pc pc4
RrdS
SS
SubU S A B pc pc4
RrdS
SS
LW SAsx16 pc pc4
read MARS
RrdD
Op / Dcd A Rrs, B Rrt
SW SAsx16 pc pc4
I-Fetch IR Mempc
wrt MARS MDRB
J pc pc31..28addr00
JR pc Rrs
BR-taken pc pc sx1600
BR-not taken pc pc 4
25Sequence of Multi-step Operations
- Operation implemented as sequence of step on
distinct resources - wash gt dry gt fold
- Multiple independent Operations
Ex
Mem
WB
26Technology Trends
- Clock Rate 30 per year
- Transistor Density 35
- Chip Area 15
- Transistors per chip 55
- Total Performance Capability 100
- by the time you graduate...
- 3x clock rate (gt10 GHz)
- 10x transistor count (100 Billion transistors)
- 30x raw capability
- plus 16x dram density,
- 32x disk density (60 per year)
- Network bandwidth,
27Pipelining
- Overlap consecutive operations
28Definition Performance
- Performance is in units of things per sec
- bigger is better
- If we are primarily concerned with response time
" X is n times faster than Y" means
29Pipeline Performance
- N operations performed in k steps each
- Sequential Time Nk
- Lower bound N (1 every cycle)
- Pipeline Time k 1 N
- Bound on Speedup on k-stage pipeline lt k
- Speedup(k,N) Time(1,N)/Time(k,N) Nk
/ (Nk-1) N / (1k/N) - StartUp Cost k-1
- Peak Rate
- Half Power point
30Performance Trends
MIPS R3000
31Processor Performance(1.35X before, 1.55X now)
1.54X/yr
32Pipelined control
Dmem
imem
IR
IR_ex
IR_mem
IR_wb
PC
33Pipelined Instruction Execution
- Fetch Instruction Every cycle
- Launch into a pipeline
- What if they are not independent?
- structural hazards
- two operations need to use same resource
- data dependence
- later instruction needs to use the value produce
by an earlier on - Detect
- Wait till hazard clears
34Pipelined Bubble
Dmem
imem
IR
IR_ex
IR_mem
IR_wb
PC