Title: EECS 470
1EECS 470
- ILP and Exceptions
- Lecture 7
- Coverage Chapter 3
2Optimizing CPU Performance
- Golden Rule tCPU NinstCPItCLK
- Given this, what are our options
- Reduce the number of instructions executed
- Reduce the cycles to execute an instruction
- Reduce the clock period
- Our first focus Reducing CPI
- Approach Instruction Level Parallelism (ILP)
3Why ILP?
- Requirements
- Parallelism
- Large window
- Limited control deps
- Eliminate false deps
- Find run-time deps
Vs.
4How Much ILP is There?
5How Large Must the Window Be?
6ALU Operation GOOD, Branch BAD
Expected Number of Branches Between
Mispredicts E(X) 1/(1-p) E.g., p 95, E(X)
20 brs, 100-ish insts
7How Accurate are Branch Predictors?
8Impact of Physical Storage Limitations
- Each instruction in flight must have storage
for its result - Really worse than this because of mispeculation
9Registers GOOD, Memory BAD
- Benefits of registers
- Well described deps
- Fast access
- Finite resource
- Memory loses these benefits for flexibility
- p
- q
- p
?
10Bottom Line for an Ambitious Design
11First Optimization Out-of-Order Writeback
12Playing by the Rules In-order Writeback
IF
ID
D1
D2
D3
D4
MEM
WB
D5
DIV.D
ADD
IF
ID
EX
MEM
WB
13Playing by the Rules In-order Writeback
Divide by Zero!
IF
ID
D1
D2
D3
D4
MEM
WB
D5
DIV.D
ADD
IF
ID
EX
MEM
WB
Whats wrong with this picture?
14Playing by the Rules In-order Writeback
Divide by Zero!
IF
ID
D1
D2
D3
D4
MEM
WB
D5
DIV.D
ADD
IF
ID
EX
MEM
WB
Whats wrong with this picture?
IF
ID
D1
D2
D3
D4
MEM
WB
D5
DIV.D
ADD
IF
ID
EX
MEM
WB
stall
stall
stall
stall
15Another Way to Get in the Same Mess
- Many systems use microcode
- Simplifies mapping of complex instructions to CPU
resources - iA32 add-with-carry
- ADC (EAX),EBXtmp MEMEAXtmp tmp EBXCF,
update CFMEMEAX tmp
Side Effect!
Potential Fault!
16Exceptions and Interrupts
Exception Type Sync/Async Maskable? Restartable?
I/O request Async Yes Yes
System call Sync No Yes
Breakpoint Sync Yes Yes
Overflow Sync Yes Yes
Page fault Sync No Yes
Misaligned access Sync No Yes
Memory Protect Sync No Yes
Machine Check Async/Sync No No
Power failure Async No No
17Solution Precise Interrupts
- Implementation approaches
- Dont
- E.g., Cray-1
- Force in-order WB
- E.g., ARM SA-1
- Force in-order checks
- E.g., Alpha 21064
- Buffer speculative results
- E.g., P4, Alpha 21264
- History buffer
- Future file/Reorder buffer
Instructions Completely Finished
Precise State
PC
Speculative State
No Instruction Has Executed At All
18Precise Interrupts via the Reorder Buffer
- _at_ Alloc
- Allocate result storage at Tail
- _at_ Sched
- Get inputs (ROB T-to-H then ARF)
- Wait until all inputs ready
- _at_ WB
- Write results/fault to ROB
- Indicate result is ready
- _at_ CT
- Wait until inst _at_ Head is done
- If fault, initiate handler
- Else, write results to ARF
- Deallocate entry from ROB
Any order
MEM
IF
ID
Alloc
Sched
EX
CT
In-order
In-order
ARF
PC Dst regID Dst value Except?
Head
Tail
- Reorder Buffer (ROB)
- Circular queue of spec state
- May contain multiple definitions of same register
19Reorder Buffer Example
ROB
Code Sequence f1 f2 / f3 r3 r2 r3 r4
r3 r2 Initial Conditions - reorder buffer
empty - f2 3.0 - f3 2.0 - r2 6 - r3
5
regID f1 result ? Except ?
regID r8 result 2 Except n
H
T
regID r8 result 2 Except n
regID f1 result ? Except ?
regID r3 result ? Except ?
Time
H
T
regID r4 result ? Except ?
regID r8 result 2 Except n
regID f1 result ? Except ?
regID r3 result 11 Except N
r3
H
T
20Reorder Buffer Example
ROB
Code Sequence f1 f2 / f3 r3 r2 r3 r4
r3 r2 Initial Conditions - reorder buffer
empty - f2 3.0 - f3 2.0 - r2 6 - r3
5
regID r4 result 5 Except n
regID r8 result 2 Except n
regID f1 result ? Except ?
regID r3 result 11 Except n
H
T
regID r4 result 5 Except n
regID f1 result ? Except y
regID r3 result 11 Except n
regID r8 result 2 Except n
Time
H
T
regID r4 result 5 Except n
regID f1 result ? Except y
regID r3 result 11 Except n
H
T
21Reorder Buffer Example
ROB
Code Sequence f1 f2 / f3 r3 r2 r3 r4
r3 r2 Initial Conditions - reorder buffer
empty - f2 3.0 - f3 2.0 - r2 6 - r3
5
H
T
first inst of fault handler
Time
H
T