Title: In-Line Interrupt Handling for Software Managed TLBs
1In-Line Interrupt Handling for Software Managed
TLBs
- Aamer Jaleel and Bruce Jacob
- Electrical and Computer Engineering
- University of Maryland at College Park
- ajaleel, blj _at_ eng.umd.edu
- International Conference on Computer Design
(ICCD) 2001 - September 24 September 26
- Austin, Texas
2Outline
- Reorder buffers
- Interrupt handling
- Traditional method
- In-lined method ( novel solution )
- Performance of in-lining TLB interrupts
- Conclusions
3Reorder Buffer (ROB)
- Hardware data structure (queue)
- Holding tank for instructions their pipeline
state - New instructions are queued at the tail and
retired from the head - Allows for interrupts to be handled in-order
ROB0
ROB1
ROB2
TAIL
ROB3
ROB4
Queue new instructions
ROB5
Empty Slots
ROB6
ROB7
ROB8
HEAD
ROB9
ROB10
Retire instructions
ROB11
ROB12
ROB13
ROB14
ROB15
Reorder Buffer (ROB) A Hardware Data Structure
4Handling an Interrupt ( Traditional Method )
- Interrupts handled at retire stage
- If ROB head .has_exception true
- Save state exceptional PC
- Flush ROB
- Set PC to appropriate handler
- Handle exception with privileges enabled
- Restore exceptional PC and continue executing
user code
5A Novel Approach In-lining
- Hardware knows length of handler
- If ROB head .has_exception true
- If empty slots in ROB gt LHC enough resources
- Save current tail pointer nextPC ( PC of next
instruction to fetch ) - Set mode to inline and reset head and tail
pointers - Fetch handler, once done unset INLINE mode, tail
pointer, continue fetching user code - When TLB updated, undo all TLB misses in ROB
- Else, handle interrupt by traditional method
6Interrupt In-lining (An Example)
Tail Ptr
Head Ptr
Assuming length of handler LHC 6, Instruction
Fetch/Retire Per Cycle 2
7Interrupt In-lining (An Example)
CYCLE 2 3 Handler Fetched User Handler
Execute (ROB 2,7,8)
8Interrupt In-lining (An Example)
Restore Tail Ptr
CYCLE 3 User Handler Execute
CYCLE 5 Fetch user code from where it last
stopped
9Interrupt In-lining (An Example)
Restore Tail Ptr
CYCLE 5 Fetch user code from where it last
stopped
CYCLE 6 Fetch execute more user code
10Interrupt In-lining (An Example)
TLB UPDATED UNDO ALL TLB INTERRUPTS
CYCLE 5 Fetch user code from where it last
stopped
CYCLE 6 Fetch execute more user code
11Interrupt In-lining (An Example)
CYCLE 6 Fetch execute more user code
12Issues With Interrupt In-lining
- Hardware knows handler length
- There should be a privilege bit per ROB entry
- When done fetching handler, fetch nextPC
- Save nextPC NOT exceptionalPC ( Add MUX )
- When done updating TLB
- Undo all instructions w/TLB miss and set them as
ready to execute - Branch mispredictions must be handled
- If mispredict occurs while in-lining, replace
nextPC
13Experimental Methodology
- Simulation Tool
- Alpha 21264
- 4-way OOO
- 80 instructions in flight
- FA I/D TLBs w/NMRU policy, 128-entry
- 8 KB page size
- 150 renaming registers
- 22 instruction D-TLB handler
- Benchmarks
- Small scientific kernels
- Red Black
- Jacobi
- Matrix Multiply
- Quicksort
14Why not SPEC2000?
- TLB miss rates are not realistic
Real Life Apps
Spec FP 2000
15Results - In-lining Limitations
- Benefit from in-lining 80-90 of TLB miss
interrupts - Can not in-line because
- Not enough space in ROB ( lt 2 )
- Pipeline already stalled due to lack of resources
( free registers )
16 of User Instructions Flushed
- Reorder buffer 50 55 full when interrupt occurs
- In-lining reduces instr flushed by 40 80
17Interrupt Overhead
- Cost of re-fetching and executing flushed instr
- In-lining reduces cost of TLB miss by 10 40
18Performance of Benchmarks
- Execution time ? by 5 25 for same size TLB
- Can get the performance of a traditional TLB with
an in-lined TLB of ¼ size
19Speedup Vs Miss Rate
- As TLB management ? benefit from in-lining ?
- Applications that will benefit from in-lining are
those that need it the most
20Conclusions
- In-lining can be used for ALL types of software
handled transparent interrupts - Avoids unnecessary flushing of pipeline
- In-line interrupt handling for TLB misses
- Cuts of instr flushed by 55-80
- Reduces overhead by 10-40
- Improves performance by 5-25
21Future Work
- Speculative in-lining
- No need to check for ROB space, check for
deadlock - Energy savings by not re-fetching and
re-executing instructions
22Related Work
- Save entire internal state of entire pipeline and
restore after completion of handler (Cyber 200
for VM interrupts) - Save instruction window (ROB) as part of machine
state, restore when done (Torng Day) - A new thread fetches handler code while existing
thread continues fetching user code (Zilles,
Emer, Sohi)
23Miscellaneous Slides
24Interrupts
- Interrupt exceptional condition
- Perform behind the scenes work
- Transparent to the user application
- e.g. unaligned memory access, instruction
emulation, TLB miss handling, etc - Two types
- Software handled ( privileged code )
- Hardware handled ( special hardware )
25Precise Interrupts
- Precise Interrupt
- Everything before excepted instruction has
finished execution and has committed - Everything after excepted instruction has NOT
committed - The excepted instruction may or may not have
finished execution
26Software-Managed TLBs Vs Hardware-Managed TLBs
- Hardware managed TLBs outperform software managed
TLBs - Software managed used because of flexibility
- Software managed TLBs
- E.g. MIPS, Alpha, SPARC, PA-RISC
- Hardware managed TLBs
- E.g. IA-32, PowerPC
27Disadvantages
- Two sources of performance loss
- No user code executes while exception is handled
- Instructions are re-fetched and re-executed
- Solution avoid pipeline ROB flushes
- Why is the pipeline flushed?
- Ensure privileges?
- Attach a privilege bit to ROB entry ( Henry )
- Have enough space for interrupt handler?
28Interrupt In-lining An Example
Assuming length of handler LHC 6, Instruction
Fetch/Retire Per Cycle 2
29In-lining TLB Interrupts
- First level handler length is short
- TLB miss handlers are most commonly executed OS
primitives - TLB miss handling account for more than 40 of
total run time and 80 of the kernels computation
time - TLB miss interrupts occur once every 100 1000
instructions in applications ranging from
databases to engineering workloads
30Issues With Interrupt In-lining
- In-lined instructions shouldnt affect state of
user registers - Problem w/conventional method of register
renaming - First handler instruction should receive a
mapping of the current state of register file - A user instruction should receive mapping of the
previous user instruction mapped
31Alpha 21264 Pipeline
- Fetch
- Decode Map
- Execute
- Write back
- Retire