Title: Page-Faults in Linux
1Page-Faults in Linux
- How can we study the handling of page-fault
exceptions?
2Why page-faults happen
- Trying to access a virtual memory-address
- Instruction-operand / instruction-address
- Read-data/write-data, or fetch-instruction
- Maybe page is not present
- Maybe page is not readable
- Maybe page is not writable
- Maybe page is not visible
3Page-fault examples
- movl eax, (ebx) writable?
- movl (ebx), eax readable?
- jmp ahead present?
- Everything depends on the entries in the current
page-directory and page-tables, - and on the cpus Current Privilege Level
4Current Privilege Level (CPL)
Layout of segment-register contents
(16 bits)
0
1
2
3
15
segment-selector
RPL
T I
TI Table-Indicator
RPLRequested Privilege Level
CPL is determined by the value of RPL field in CS
and SS
5What does the CPU do?
- Whenever the cpu detects a page-fault, its
- action depends on Current Privilege Level
- If CPL 0 (executing in kernel mode)
- 1) push EFLAGS register
- 2) push CS register
- 3) push EIP register
- 4) push error-code
- 5) jump to page-fault service-routine
6Alternative action in user-mode
- If CPL 3 (executing in user mode)
- the CPU will switch to its kernel-mode stack
- 0) push SS and ESP
- 1) push EFLAGS
- 2) push CS
- 3) push EIP
- 4) push error-code
- 5) jump to the page-fault service-routine
7How CPU finds new stack
- Special CPU segment-register TR
- TR is the Task Register
- TR holds selector for a GDT descriptor
- Descriptor is for a Task State Segment
- So TR points indirectly to current TSS
- TSS stores address of kernel-mode stack
8Stack Switching mechanism
user code
CS
EIP
user stack
INTERRUPT DESCRIPTOR TABLE
SS
ESP
user-space
kernel-space
kernel code
Gate descriptor
IDTR
GLOBAL DESCRIPTOR TABLE
kernel stack
SS0
ESP0
TSS descriptor
TR
TASK STATE SEGMENT
GDTR
9Lets intercept page-faults
- Use our systems programming knowledge
- We build a new Interrupt Descriptor Table
- With our own customized interrupt-gates
- Use a new gate for page-fault exceptions
- Other existing gates we can simply copy
- Why not just modify the existing IDT?
- Its write-protected in some Linux kernels
- But we can still read it (i.e., for copying)
10Very delicate to implement
- Will need to use some assembly language
- Using C language doesnt give full control
- C Compiler designers didnt plan for this!
- (except they did allow for using assembly)
- Assembly requires us to be very precise
- So try keeping assembly to a minimum
- We can use a mixture of assembly and C
11Allocate a mapped page
- Device interrupts are asynchronous
- CPU requires instant access to the IDT
- We must insure CPU can find new IDT
- Cannot risk putting it in high memory
- We can use get_free_page() function
- With flags GFP_KERNEL and GFP_DMA
- (This insures page will be always mapped)
- No memory available? Cannot continue.
12Must find address of current IDT
- Well need it for copying the existing gates
- Well need it for restoring old IDT upon exit
- We can use the sidt instruction to find it
- But sidt needs a 48-bit memory-operand
- No such type is directly supported in C
- We could use a 64-bit type (i.e., long long)
- Better to use array of three 16-bit values
13Getting hold of current IDT
- We need to declare a global variable
- Because init_module() needs it
- And also cleanup_module() needs it
- Use static to make it private
- Use short to get 16-bit array-entries
- Use unsigned to avoid sign-extensions
- static unsigned short oldidtr 3
14Activating a new IDT
- When were ready, we can use sidt
- Instruction will change the IDTR register
- Instruction needs 48-bit memory operand
- So again we will declare a suitable array
- static unsigned short newidtr 3
15Initializations
- We need to initialize our idtr array
- We need to initialize new Descriptor Table
- Use memcpy() for copying within kernel
- Page-Faults gate-descriptor must be built
- Must conform to CPUs expected layout
- Need to use a local 64-bit variable
- unsigned long long gate_desc
16Format for a Gate Descriptor
Quadword (64-bits)
0
63
segment-selector
offset 150
gate type
offset 3116
The address of the fault-handler is split into
a hiword and a loword
17Declaring our fault-handler
- Tell the C compiler our handlers name
- asmlinkage void isr0x0E( void )
- Its type and value are set by assembler
- asm( .text )
- asm( .type isr0x0E, _at_function )
- asm(isr0x0E )
18Save/Restore cpu registers
- Upon entering
- asm( pushal )
- asm( pushl ds )
- asm( pushl es )
- Upon leaving
- asm( popl es )
- asm( popl ds )
- asm( popal )
- asm( jmp old_isr )
19Handler must access kernel data
- Registers CS and SS get set up by the CPU
- But its our job to set up DS and ES registers
- Linux uses same segments for data and stack
- asm( mov ss, eax )
- asm( mov eax, ds )
- asm( mov eax, es )
-
- (Current kernel version doesnt use FS or GS)
20 Transfer to a C function
- Handler will need some info from the stack
- The error-code will be needed for sure
- So C function will need an argument
- So heres our C function prototype
- static void handler( unsigned long tos )