Title: Kernel Entry Kernel Exit
1Kernel Entry / Kernel Exit
- Marcus Völp
- Universität Karlsruhe
2System-calls
- What are system-calls used for ?
- Access operating system functionality
- File System
- Network
- Devices
- Perform privileged operations
- Process creation / manipulation
- (Memory Management)
- Privileged instructions
3System-calls
thread
- Kernel executes system-call on behalf of
user-thread - Enter supervised mode
- Perform operation
- Leave supervised mode
System-call
user
kernel
4Entering the kernel
syscall(id 42, param) set_syscall_params
(id, param) div 0
- Exception
- undefined opcode
- div 0
- Software interrupt
- int 42
- Special instruction
- sysenter
Exception handler case id 42
read_syscall_params execute systemcall
5Entering the kernel
- Exception
- undefined opcode
- div 0
- Software interrupt
- int 42
- Special instruction
- sysenter
syscall(id 42, param) set_syscall_params
(param) int 42
Int 42 handler read_syscall_params
execute systemcall
6Entering the kernel
- Exception
- undefined opcode
- div 0
- Software interrupt
- int 42
- Special instruction
- sysenter
syscall(id 42, param) set_syscall_params
(id, param) sysenter
Syscall entry point case id 42
read_syscall_params execute systemcall
7from C to C
User-level program in C
User binding
Kernel binding
System-call in C
kernel
user
8User binding
- User-level interface between
- language and system-call
- prepare input parameters
- (Calling convention)
- enter supervisor mode
- prepare output parameters
- (Calling convention)
9Kernel binding
- Kernel level interface between
- kernel entry point and c-syscall function
- (select syscall function)
- extract input parameters
- call c-function of syscall
- prepare return parameters
- return to user land
10Parameter passing
- User Stack / Memory
- Register Set
- UTCB
Read UL Memory
- Stack per Thread
- Memory per Address Space
11Parameter passing
- User Stack / Memory
- Register Set
- UTCB
R1
R2 . . .
Rn
- Unmodified register content on kernel entry
- Convention Kernel overwrites R1, R3
- Private registers for kernel use
-
12Parameter passing
Kernel Stack
- User Stack / Memory
- Register Set
- UTCB
R1
Rn
R2 . . .
R2
R1
Rn
Limited number of registers available
13Parameter passing
- User Stack / Memory
- Register Set
- UTCB
No page-faults combined with larger memory space.
14Example x86
void foo (int p1, int p2, int p3, int p4)
ret _foo p1 2 p2 4 p3 p4 old frame
ptr r1 2 r2 4
sp
main() int r1 2 int r2 4 foo (r1,
r2, r1, r2)
foo params
main locals
15Example MIPS
hardwired 0
R0
R1
void foo (int p1, int p2, int p3, int p4)
Return value
R2
R3
p1 2
R4
main() int r1 2 int r2 4 foo (r1,
r2, r1, r2)
p2 4
R5
p3 r1
R6
p4 r2
R7
R8
R30
ret _foo
R31
16Example x86
- Kernel entry with exception
- x86 translates
- exception to software interrupt
17Example x86
- Kernel entry with software interrupt
Register EAX-EBP EFlags
Int_23_handler pusha
Err-No
int 23
18Example x86
- Kernel entry with software interrupt
Register EAX-EBP EFlags
Int_23_handler pusha
Err-No
int 23
19Example x86
- Returning from software interrupt
kernel stack
int_23_handler pusha prepare input
parameters call 23_handler_c_function prepare
return parameters
Register EAX-EBP EFlags
Err-No
popa add esp, 4 // get rid of err-no iret
int 23 nop
U-SS
U-ESP
U-CS
U-EIP
20Example x86
- Kernel entry with sysenter
MSR SYSENTER_CS SYSENTER_EIP SYSENTER_ESP
GDT Kernel CS Kernel SS
Flat 4GB cs rx ss rw
- no stack frame
- no user level registers
- switch to PL 0
21Example x86
- Returning from sysenter with sysexit
GDT Kernel CS Kernel SS User CS User SS
MSR SYSENTER_CS SYSENTER_EIP SYSENTER_ESP
Flat 4GB cs rx ss rw
- EDX gt EIP
- ECX gt ESP
- switch to PL 3
NOTE Sysenter / Sysexit are asymmetric
22Performance int X / iret vs. sysenter / sysexit
23Linux
arch/i386/entry.s
TCB
int 80
ENTRY(sys_call) push eax SAVE_ALL()
GET_CURRENT(ebx) cmp (NR_syscalls), eax jae
badsys testb 0x02, tsk_ptrace(ebx) jne
tracesys call (SYMBOL_NAME(sys_call_table)(,
eax, 4) mov eax, EAX(esp) ENTRY(ret_from_syscall
) cli cmp 0, need_resched(ebx) jne
reschedule cmp 0, sigpending(ebx) jne
signal_return restore_all RESTORE_ALL
USER - EBX USER - ECX USER - EDX USER - ESI USER
- EDI USER - EBP USER - EAX USER - DS USER -
ES EAX SYSCALL ID USER - EIP USER - CS USER
EFLAGS OLD ESP OLD SS
Stack
ENTRY(sys_call_table) .long SYMBOL_NAME(sys_ni_s
yscall) 0 .long SYMBOL_NAME(sys_exit) 1
.long SYMBOL_NAME(sys_fork) 2 .long
SYMBOL_NAME(sys_read) 3 .long
SYMBOL_NAME(sys_fnctl16) 221
24L4 Small Spaces
- High AS switch costs because of TLB flush
- Affinity of certain application sets
- gcc, make,
- quake, X,
25L4 Small Spaces
- no tagged TLB
- gt expensive address space switches