Part 2: Advanced Static Analysis - PowerPoint PPT Presentation

1 / 102
About This Presentation
Title:

Part 2: Advanced Static Analysis

Description:

Chapter 4: A Crash Course in x86 Disassembly Chapter 5: IDA Pro Chapter 6: Recognizing C Code Constructs in Assembly – PowerPoint PPT presentation

Number of Views:253
Avg rating:3.0/5.0
Slides: 103
Provided by: Wuch7
Category:

less

Transcript and Presenter's Notes

Title: Part 2: Advanced Static Analysis


1
Part 2 Advanced Static Analysis
  • Chapter 4 A Crash Course in x86 Disassembly
  • Chapter 5 IDA Pro
  • Chapter 6 Recognizing C Code Constructs in
    Assembly

2
How software works
  • gcc compiler driver pre-processes, compiles,
    assembles and links to generate executable
  • Links together object code (i.e. game.o) and
    static libraries (i.e. libc.a) to form final
    executable
  • Links in references to dynamic libraries for code
    loaded at load time (i.e. libc.so.1)?
  • Executable may still load additional dynamic
    libraries at run-time

Pre- processor
Compiler
Linker
Assembler
hello.c
hello.i
hello.s
hello.o
hello
Program Source
Modified Source
Assembly Code
Object Code
Executable Code
3
Static libraries
  • Suppose you have utility code in x.c, y.c, and
    z.c that all of your programs use
  • Link together individual .o files
  • gcc o hello hello.o x.o y.o z.o
  • Create a library libmyutil.a using ar and ranlib
    and link library in statically
  • libmyutil.a x.o y.o z.o
  • ar rvu libmyutil.a x.o y.o z.o
  • ranlib libmyutil.a
  • gcc o hello hello.c L. lmyutil
  • Note library code copied directly into binary

4
Dynamic libraries
  • Avoid having multiple copies of common code on
    disk
  • Problem libc
  • gcc program.c lc creates an a.out with entire
    libc object code in it (libc.a)?
  • Almost all programs use libc!
  • Solution Have binaries compiled with a reference
    to a library of shared objects versus an entire
    copy of the library
  • Libraries loaded at run-time from file system
  • ldd ltbinarygt to see which dynamic libraries a
    program relies upon
  • gcc flags shared and -soname for handling
    and generating dynamic shared object files

5
The linking process (ld)?
  • Merges object files
  • Merges multiple relocatable (.o) object files
    into a single executable program.
  • Resolves external references
  • References to symbols defined in another object
    file.
  • Relocates symbols
  • Relocates symbols from their relative locations
    in the .o files to new absolute positions in the
    executable.
  • Updates all references to these symbols to
    reflect their new positions.
  • References in both code and data
  • code a() / reference to symbol a /
  • data int xpx / reference to symbol x /

6
Executables
  • Various file formats
  • Linux Executable and Linkable Format (ELF)?
  • Windows Portable Executable (PE)

7
ELF
  • Standard binary format for object files in Linux
  • One unified format for
  • Relocatable object files (.o),
  • Shared object files (.so)?
  • Executable object files
  • Better support for shared libraries than old
    a.out formats.
  • More complete information for debuggers.

8
ELF Object File Format
0
  • ELF header
  • Magic number, type (.o, exec, .so), machine, byte
    ordering, etc.
  • Program header table
  • Page size, virtual addresses of memory segments
    (sections), segment sizes, entry point
  • .text section
  • Code
  • .data section
  • Initialized (static) data
  • .bss section
  • Uninitialized (static) data
  • Block Started by Symbol

ELF header
Program header table (required for executables)?
.text section
.data section
.bss section
.symtab
.rel.text
.rel.data
.debug
Section header table (required for relocatables)?
9
ELF Object File Format (cont)?
0
  • .symtab section
  • Symbol table
  • Procedure and static variable names
  • Section names and locations
  • .rel.text section
  • Relocation info for .text section
  • Addresses of instructions that will need to be
    modified in the executable
  • Instructions for modifying.
  • .rel.data section
  • Relocation info for .data section
  • Addresses of pointer data that will need to be
    modified in the merged executable
  • .debug section
  • Info for symbolic debugging (gcc -g)?

ELF header
Program header table (required for executables)?
.text section
.data section
.bss section
.symtab
.rel.text
.rel.data
.debug
Section header table (required for relocatables)?
10
PE (Portable Executable) file format
  • Windows file format for executables
  • Based on COFF Format
  • Magic Numbers, Headers, Tables, Directories,
    Sections
  • Disassemblers
  • Overlay Data with C Structures
  • Load File as OS Loader Would
  • Identify Entry Points (Default Exported)?

11
Example C Program
m.c
a.c
extern int e int epe int x15 int y
int a() return epxy
int e7 int main() int r a()
exit(0)
12
Merging Relocatable Object Files into an
Executable Object File
Relocatable Object Files
Executable Object File
0
system code
.text
headers
.data
system data
system code
main()?
.text
a()?
main()?
.text
m.o
more system code
.data
int e 7
system data
int e 7
.data
a()?
int ep e
.text
int x 15
.bss
a.o
.data
int ep e
uninitialized data
int x 15
.symtab .debug
.bss
int y
13
Program execution
  • Operating system provides
  • Protection and resource allocation
  • Abstract view of resources (files, system calls)?
  • Virtual memory
  • Uniform memory space abstraction for each process
  • Gives the illusion that each process has entire
    memory space

14
How does a program get loaded?
  • The operating system creates a new process.
  • Including among other things, a virtual memory
    space
  • Important any hardware-based debugger must know
    OS state in page tables to map accesses to
    virtual addresses
  • System loader reads the executable file from the
    file system into the memory space.
  • Reads executable from file system into memory
    space
  • Executable contains code and statically link
    libraries
  • Done via DMA (direct memory access)?
  • Executable in file system remains and can be
    executed again
  • Loads dynamic shared objects/libraries into
    memory
  • Resolves addresses in code given where code/data
    is loaded
  • Then it starts the thread of execution running

15
Loading Executable Binaries
Executable object file for example program p
0
ELF header
Virtual addr
Process image
Program header table (required for executables)?
0x080483e0
init and shared lib segments
.text section
.data section
0x08048494
.text segment (r/o)?
.bss section
.symtab
.rel.text
0x0804a010
.data segment (initialized r/w)?
.rel.data
.debug
0x0804a3b0
Section header table (required for relocatables)?
.bss segment (uninitialized r/w)?
16
More on relocation
  • Assembly code with relative and absolute
    addresses
  • With VM abstraction, old linkers decide layout
    and can supply definitive addresses
  • Windows .com format
  • Linker can statically bind the program to virtual
    addresses
  • Now, they provide hints as to where they would
    like to be placed
  • But.this could also be done at load time
    (address space layout randomization)?
  • Windows .exe format
  • Loader rewrites addresses to proper offsets
  • System needs to force position-independent code
  • Force compiler to make all jumps and branches
    relative to current location or relative to a
    base register set at run-time
  • ELF uses Global Offset Table
  • Symbol addresses obtained from GOT before access
  • Can be targetted for hooks!
  • Implementation determines exploit

17
Program execution
CPU
Memory
Addresses
Registers
E I P
Object Code Program Data OS Data
Data
Condition Codes
Instructions
Stack
  • Programmer-Visible State
  • EIP - Instruction Pointer
  • a. k. a. Program Counter
  • Address of next instruction
  • Register File
  • Heavily used program data
  • Condition Codes
  • Store status information about most recent
    arithmetic operation
  • Used for conditional branching
  • Memory
  • Byte addressable array
  • Code, user data, OS data
  • Includes stack used to support procedures

18
Run-time data structures
0xffffffff
kernel virtual memory (code, data, heap, stack)?
memory invisible to user code
0xc0000000
user stack (created at runtime)?
esp (stack pointer)?
memory mapped region for shared libraries
0x40000000
brk
run-time heap (managed by malloc)?
read/write segment (.data, .bss)?
loaded from the executable file
read-only segment (.init, .text, .rodata)?
0x08048000
unused
0
19
Registers
  • The processor operates on data in registers
    (usually)?
  • movl (eax), ecx
  • Fetch data at address contained in eax
  • Store in register ecx
  • movl array, ecx
  • Move address of variable array into ecx
  • Typically, data is loaded into registers,
    manipulated or used, and then written back to
    memory
  • The IA32 architecture is register poor
  • Few general purpose registers
  • Source or destination operand is often memory
    locations
  • Makes context-switching amongst processes easy
    (less register-state to store)?

20
IA32 General Registers
0
15
7
8
31
ax
ah
al
eax
cx
ch
cl
ecx
dx
dh
dl
edx
General purpose registers (mostly)?
bx
bh
bl
ebx
esi
si
edi
di
Stack pointer
esp
sp
Special purpose registers
Frame pointer
ebp
bp
21
Operand types
  • A typical instruction acts on 1 or more operands
  • addl ecx, edx adds the contents of ecx to edx
  • Three general types of operands
  • Immediate
  • Like a C constant, but preceded by
  • e.g., 0x1F, -533
  • Encoded with 1, 2, or 4 bytes based on
    instruction
  • Register the value in one of the 8 integer
    registers
  • Memory a memory address
  • There are many modes for addressing memory

22
Operand examples using mov
Source
Destination
C Analog
Reg
movl 0x4,eax
temp 0x4
Imm
Mem
movl -147,(eax)?
p -147
Reg
movl eax,edx
temp2 temp1
movl
Reg
Mem
movl eax,(edx)?
p temp
Mem
Reg
movl (eax),edx
temp p
  • Memory-memory transfers cannot be done with
    single instruction

23
Addressing Modes
  • Immediate and registers have only one mode
  • Memory on the other hand
  • Absolute
  • specify the address of the data
  • Indirect
  • use register to calculate address
  • Base displacement
  • use register plus absolute address to calculate
    address
  • Indexed
  • Indexed
  • Add contents of an index register
  • Scaled index
  • Add contents of an index register scaled by a
    constant

24
Summary of IA32 Operand Forms
25
x86 instructions
  • Rules
  • Source operand can be memory, register or
    constant
  • Destination can be memory or register
  • Only one of source and destination can be memory
  • Source and destination must be same size
  • Flags set on each instruction
  • EFLAGS
  • Conditional branches handled via EFLAGS

26
Whats the l for on the end?
  • addl 8(ebp),eax
  • It stands for long and is 32-bits
  • It tells the size of the operand.
  • Baggage from the days of 16-bit processors
  • For x86, x86_64
  • 8 bits is a byte
  • 16 bits is a word
  • 32 bits is a double word
  • 64 bits is a quad word

27
IA32 Standard Data Types
28
Global vs. Local variables
  • Global variables stored in either .data or .bss
    section of process
  • Local variables stored on stack

29
Global vs local example
void a() int x 1 int y 2 x xy
printf("Total d\n",x) int main()
a()
int x 1 int y 2 void a() x xy
printf("Total d\n",x) int main()a()
30
Global vs local example
void a() int x 1 int y 2 x
xy printf("Total d\n",x) int main()
a() 080483c4 ltagt 80483c4 push
ebp 80483c5 mov esp,ebp 80483c7
sub 0x8,esp 80483ca mov
0x804966c,edx 80483d0 mov
0x8049670,eax 80483d5 lea
(edx,eax,1),eax 80483d8 mov
eax,0x804966c 80483dd mov
0x804966c,eax 80483e2 mov
eax,0x4(esp) 80483e6 movl
0x80484f0,(esp) 80483ed call 80482dc
ltprintf_at_pltgt 80483f2 leave 80483f3
ret
int x 1 int y 2 void a() x xy
printf("Total d\n",x) int
main()a() 080483c4 ltagt 80483c4 push
ebp 80483c5 mov esp,ebp 80483c7
sub 0x18,esp 80483ca movl
0x1,-0x8(ebp) 80483d1 movl
0x2,-0x4(ebp) 80483d8 mov
-0x4(ebp),eax 80483db add
eax,-0x8(ebp) 80483de mov
-0x8(ebp),eax 80483e1 mov
eax,0x4(esp) 80483e5 movl
0x80484f0,(esp) 80483ec call 80482dc
ltprintf_at_pltgt 80483f1 leave 80483f2
ret
31
Arithmetic operations
void f() int a 0 int b 1
a a11 a a-b a--
b int main() f()
08048394 ltfgt 8048394 push ebp
8048395 mov esp,ebp 8048397
sub 0x10,esp 804839a movl
0x0,-0x8(ebp) 80483a1 movl
0x1,-0x4(ebp) 80483a8 addl
0xb,-0x8(ebp) 80483ac mov
-0x4(ebp),eax 80483af sub
eax,-0x8(ebp) 80483b2 subl
0x1,-0x8(ebp) 80483b6 addl
0x1,-0x4(ebp) 80483ba leave 80483bb
ret
32
Machine Instruction Example
  • C Code
  • Add two signed integers
  • Assembly
  • Add 2 4-byte integers
  • Long words in GCC parlance
  • Same instruction whether signed or unsigned
  • Operands
  • x Register eax
  • y Memory Mebp8
  • t Register eax
  • Return function value in eax
  • Object Code
  • 3-byte instruction
  • Stored at address 0x401046

int sum(int x, int y)? int t xy return
t
_sum pushl ebp movl esp,ebp movl
12(ebp),eax addl 8(ebp),eax movl
ebp,esp popl ebp ret
0x401046 03 45 08
33
Condition codes
  • The IA32 processor has a register called eflags
  • (extended flags)
  • Each bit is a flag, or condition code
  • CF Carry Flag SF Sign Flag
  • ZF Zero Flag OF Overflow Flag
  • As programmers, we dont write to this register
    and seldom read it directly
  • Flags are set or cleared by hardware depending on
    the result of an instruction

34
Condition Codes (cont.)
  • Setting condition codes via compare instruction
  • cmpl b,a
  • Computes a-b without setting destination
  • CF set if carry out from most significant bit
  • Used for unsigned comparisons
  • ZF set if a b
  • SF set if (a-b) lt 0
  • OF set if twos complement overflow
  • (agt0 blt0 (a-b)lt0) (alt0 bgt0
    (a-b)gt0)
  • Byte and word versions cmpb, cmpw

35
Condition Codes (cont.)
  • Setting condition codes via test instruction
  • testl b,a
  • Computes ab without setting destination
  • Sets condition codes based on result
  • Useful to have one of the operands be a mask
  • Often used to test zero, positive
  • testl eax, eax
  • ZF set when ab 0
  • SF set when ab lt 0
  • Byte and word versions testb, testw

36
if statements
void f() int x 1 int y 2
if (xy) printf("x
equals y.\n") else
printf("x is not equal to y.\n")
int main() f()
080483c4 ltfgt 80483c4 push ebp
80483c5 mov esp,ebp 80483c7
sub 0x18,esp 80483ca movl
0x1,-0x8(ebp) 80483d1 movl
0x2,-0x4(ebp) 80483d8 mov
-0x8(ebp),eax 80483db cmp
-0x4(ebp),eax 80483de jne 80483ee
ltf0x2agt 80483e0 movl
0x80484f0,(esp) 80483e7 call 80482d8
ltputs_at_pltgt 80483ec jmp 80483fa
ltf0x36gt 80483ee movl
0x80484fc,(esp) 80483f5 call 80482d8
ltputs_at_pltgt 80483fa leave 80483fb
ret
37
if statements
  • int a 1, b 3, c
  • if (a gt b)
  • c a
  • else
  • c b
  • 00000018 C7 45 FC 01 00 00 00 mov dword ptr
    ebp-4,1 store a 1
  • 0000001F C7 45 F8 03 00 00 00 mov dword ptr
    ebp-8,3 store b 3
  • 00000026 8B 45 FC mov eax,dword ptr ebp-4
    move a into EAX register
  • 00000029 3B 45 F8 cmp eax,dword ptr ebp-8
    compare a with b (subtraction)
  • 0000002C 7E 08 jle 00000036 if (altb)
    jump to line 00000036
  • 0000002E 8B 4D FC mov ecx,dword ptr ebp-4
    else move 1 into ECX register
  • 00000031 89 4D F4 mov dword ptr ebp-0Ch,ecx
    move ECX into c (12 bytes down)
  • 00000034 EB 06 jmp 0000003C
    unconditional jump to 0000003C
  • 00000036 8B 55 F8 mov edx,dword ptr ebp-8
    move 3 into EDX register
  • 00000039 89 55 F4 mov dword ptr ebp-0Ch,edx
    move EDX into c (12 bytes down)

38
Loops
int factorial_do(int x) int result 1 do
result x x x-1 while (x gt
1) return result
factorial_do pushl ebp movl
esp, ebp movl 8(ebp), edx
movl 1, eax .L2 imull edx, eax
decl edx cmpl 1, edx
jg .L2 leave ret
39
C switch statements
  • Implementation options
  • Series of conditionals
  • testl followed by je
  • Good if few cases
  • Slow if many cases
  • Jump table (example below)
  • Lookup branch target from a table
  • Possible with a small range of integer constants
  • GCC picks implementation based on structure
  • Example

.L3
.L2 .L0 .L1 .L1 .L2 .L0
switch (x) case 1 case 5 code at L0 case
2 case 3 code at L1 default code at L2
1. init jump table at .L3 2. get address at
.L34x 3. jump to that address
40
Example
int switch_eg(int x) int result x
switch (x) case 100 result
13 break case 102
result 10 / Fall through
/ case 103 result 11
break case 104 case
106 result result
break default result 0
return result
41
int switch_eg(int x) int result x
switch (x) case 100 result
13 break case 102
result 10 / Fall through
/ case 103 result 11
break case 104 case
106 result result
break default result 0
return result
leal -100(edx),eax cmpl
6,eax ja .L9 jmp
.L10(,eax,4) .p2align 4,,7 .section
.rodata .align 4 .align
4 .L10 .long .L4 .long .L9
.long .L5 .long .L6 .long .L8
.long .L9 .long .L8 .text
.p2align 4,,7 .L4 leal
(edx,edx,2),eax leal
(edx,eax,4),edx jmp .L3
.p2align 4,,7 .L5 addl 10,edx
.L6 addl 11,edx jmp .L3
.p2align 4,,7 .L8 imull edx,edx
jmp .L3 .p2align 4,,7 .L9 xorl
edx,edx .L3 movl edx,eax
Key is jump table at L10 Array of pointers to
jump locations
42
x86-64 conditionals
  • Modern CPUs with deep pipelines
  • Instructions fetched far in advance of execution
  • Mask the latency going to memory
  • Problem What if you hit a conditional branch?
  • Must predict which branch to take!
  • Branch prediction in CPUs well-studied, fairly
    effective
  • But, best to avoid conditional branching
    altogether
  • x86-64 conditionals
  • Conditional instruction execution

43
Conditional Move
  • Conditional move instruction
  • cmovXX src, dest
  • Move value from src to dest if condition XX holds
  • No branching
  • Handled as operation within Execution Unit
  • Added with P6 microarchitecture (PentiumPro
    onward)
  • Example
  • Current version of GCC wont use this instruction
  • Thinks its compiling for a 386
  • Performance
  • 14 cycles on all data
  • More efficient than conditional branching (simple
    control flow)
  • But overhead both branches are evaluated

movl 8(ebp),edx Get x movl 12(ebp),eax
rvaly cmpl edx, eax rvalx cmovll
edx,eax If lt, rvalx
44
x86-64 conditional example
int absdiff( int x, int y) int result
if (x gt y) result x-y else
result y-x return result
absdiff x in edi, y in esi movl edi,
eax eax x movl esi, edx edx
y subl esi, eax eax x-y subl edi,
edx edx y-x cmpl esi, edi
xy cmovle edx, eax eaxedx if lt ret
45
IA32 Stack
  • Region of memory managed with stack discipline
  • Grows toward lower addresses
  • Register esp indicates lowest stack address
  • address of top element

Stack Bottom

Stack Grows Down
Stack Top
46
IA32 Stack Pushing
  • Pushing
  • pushl Src
  • Decrement esp by 4
  • Fetch operand at Src
  • Write operand at address given by esp
  • e.g. pushl eax
  • subl 4, esp
  • movl eax,(esp)?

Stack Bottom

Stack Grows Down
-4

Stack Top
47
IA32 Stack Popping
  • Popping
  • popl Dest
  • Read operand at address given by esp
  • Write to Dest
  • Increment esp by 4
  • e.g. popl eax
  • movl (esp),eax
  • addl 4,esp

Stack Bottom

Stack Grows Down
4

Stack Top
48
Stack Operation Examples
pushl eax
popl edx
Initially
0x110
0x110
0x110
0x10c
0x10c
0x10c
0x108
123
0x108
123
0x108
123
0x104
0x104
213
213
Top
Top
Top
eax
eax
eax
213
213
213
edx
edx
edx
555
213
esp
esp
esp
0x108
0x108
0x104
0x104
0x108
49
Procedure Control Flow
  • Procedure call
  • call label
  • Push address of next instruction (after the call)
    on stack
  • Jump to label
  • Procedure return
  • ret Pop address from stack into eip register

50
Procedure Call Example
804854e e8 3d 06 00 00 call 8048b90
ltmaingt 8048553 50 next instruction
call 8048b90
0x110
0x110
0x10c
0x10c
0x108
123
0x108
123
0x104
0x8048553
esp
esp
0x108
0x108
0x104
eip
eip
0x804854e
0x804854e
0x8048b90
eip is program counter
51
Procedure Return Example
8048e90 c3 ret
ret
0x110
0x110
0x10c
0x10c
0x108
123
0x108
123
0x104
0x8048553
0x8048553
esp
esp
0x104
0x104
0x108
0x8048e91
0x8048553
eip
eip
0x8048e90
eip is program counter
52
Procedure Control Flow
  • When procedure foo calls who
  •  foo is the caller, who is the callee
  • Control is transferred to the callee
  • When procedure returns
  • Control is transferred back to the caller
  • Last-called, first-return (LIFO) order
  • Naturally implemented via the stack

foo()? who()
call
who()? amI() amI()
call
amI()?
ret
ret
53
Procedure calls and stack frames
  • How does the callee know where to return later?
  • Return address placed in a well-known location on
    stack within a stack frame
  • How are arguments passed to the callee?
  • Arguments placed in a well-known location on
    stack within a stack frame
  • Upon procedure invocation
  • Stack frame created for the procedure
  • Stack frame is pushed onto program stack
  • Upon procedure return
  • Its frame is popped off of stack
  • Callers stack frame is recovered

Stack bottom
foos stack frame
increasing addresses
stack growth
whos stack frame
amIs stack frame
Call chain foo gt who gt amI
54
Keeping track of stack frames
  • The stack pointer (esp) moves around
  • Can be changed within procedure
  • Problem
  • How can we consistently find our parameters?
  • The base pointer (ebp)?
  • Points to the base of our current stack frame
  • Also called the frame pointer
  • Within each function, ebp stays constant
  • Most information on the stack is referenced
    relative to the base pointer
  • Base pointer setup is the programmers job
  • Actually usually the compilers job

55
IA32/Linux Stack Frame
  • Current Stack Frame (Yellow) (From Top to
    Bottom)?
  • Parameters for function about to be called
  • Argument build of caller
  • Local variables
  • If cant keep in registers
  • Saved register context
  • Old frame pointer
  • Caller Stack Frame (Pink)?
  • Return address
  • Pushed by call instruction
  • Arguments for this call
  • Argument build of callee
  • etc

Caller Frame
Arguments
Frame Pointer (ebp)?
Return Addr
Old ebp
Saved Registers Local Variables
Argument Build
Stack Pointer (esp)?
56
swap
Calling swap from call_swap
int zip1 15213 int zip2 91125 void
call_swap()? swap(zip1, zip2)
call_swap pushl zip2 Global
Var pushl zip1 Global Var call swap

Resulting Stack
void swap(int xp, int yp) int t0 xp
int t1 yp xp t1 yp t0
zip2
zip1
Rtn adr
esp
57
swap
swap pushl ebp movl esp,ebp pushl
ebx movl 12(ebp),ecx movl
8(ebp),edx movl (ecx),eax movl
(edx),ebx movl eax,(edx)? movl
ebx,(ecx)? movl -4(ebp),ebx movl
ebp,esp popl ebp ret
void swap(int xp, int yp) int t0 xp
int t1 yp xp t1 yp t0
Setup
Body
Finish
58
swap Setup 1
Resulting stack
Entering Stack
ebp

zip2
zip1
Rtn adr
esp
swap pushl ebp movl esp,ebp pushl ebx
59
swap Setup 2
Stack before instruction
swap pushl ebp movl esp,ebp pushl ebx
60
swap Setup 3
Stack before instruction

yp
xp
Rtn adr
ebp
Old ebp
esp
swap pushl ebp movl esp,ebp pushl ebx
61
Effect of swap Setup
Resulting Stack
Entering Stack
ebp


Offset (relative to ebp)?
yp
12
zip2
xp
8
zip1
Rtn adr
4
Rtn adr
esp
ebp
Old ebp
0
Old ebx
esp
movl 12(ebp),ecx get yp movl 8(ebp),edx
get xp . . .
Body
62
swap Finish 1

swaps Stack

Offset
Offset
yp
12
yp
12
xp
8
xp
8
Rtn adr
4
Rtn adr
4
ebp
Old ebp
0
ebp
Old ebp
0
Old ebx
esp
-4
Old ebx
esp
-4
movl -4(ebp),ebx movl ebp,esp popl
ebp ret
  • Observation
  • Saved restored register ebx

63
swap Finish 2

swaps Stack

swaps Stack
Offset
Offset
yp
12
yp
12
xp
8
xp
8
Rtn adr
4
Rtn adr
4
ebp
Old ebp
0
ebp
Old ebp
0
Old ebx
esp
-4
esp
movl -4(ebp),ebx movl ebp,esp popl
ebp ret
64
swap Finish 3
ebp

swaps Stack

swaps Stack
Offset
Offset
yp
12
yp
12
xp
8
xp
8
Rtn adr
4
Rtn adr
4
esp
Old ebp
0
ebp
esp
movl -4(ebp),ebx movl ebp,esp popl
ebp ret
65
swap Finish 4
ebp

swaps Stack
ebp

Exiting Stack
Offset
yp
12
zip2
xp
8
zip1
esp
Rtn adr
4
esp
movl -4(ebp),ebx movl ebp,esp popl
ebp ret
  • Observation
  • Saved restored register ebx
  • Didnt do so for eax, ecx, or edx

66
swap
void swap(int xp, int yp) int t0 xp
int t1 yp xp t1 yp t0
swap pushl ebp movl esp,ebp pushl
ebx movl 12(ebp),ecx movl
8(ebp),edx movl (ecx),eax movl
(edx),ebx movl eax,(edx)? movl
ebx,(ecx)? movl -4(ebp),ebx movl
ebp,esp popl ebp ret
Setup
Save old ebp of caller frame Set new ebp for
callee (current) frame Save state of ebx
register from caller
Body
Retrieve parameter yp from caller frame Retrieve
parameter xp from caller frame Perform swap
Finish
Restore the state of callers ebx register Set
stack pointer to bottom of callee frame
(ebp)? Restore ebp to original state Pop
return address from stack to eip
Equivalent to single leave instruction
67
Local variables
  • Where are they in relation to ebp?
  • Stored above ebp (at lower addresses)?
  • How are they preserved if the current function
    calls another function?
  • Compiler updates esp beyond local variables
    before issuing call
  • What happens to them when the current function
    returns?
  • Are lost (i.e. no longer valid)?

68
Register Saving Conventions
  • When procedure foo calls who
  •  foo is the caller, who is the callee
  • Can Register be Used for Temporary Storage?
  • Conventions
  • Caller Save
  • Caller saves temporary in its frame before
    calling
  • Callee Save
  • Callee saves temporary in its frame before using

69
IA32 Register Usage
  • Integer Registers
  • Two have special uses
  • ebp, esp
  • Three managed as callee-save
  • ebx, esi, edi
  • Old values saved on stack prior to using
  • Three managed as caller-save
  • eax, edx, ecx
  • Do what you please, but expect any callee to do
    so, as well
  • Return value in eax

eax
Caller-Save Temporaries
edx
ecx
ebx
Callee-Save Temporaries
esi
edi
esp
Special
ebp
70
simple.c
gcc O2 c simple.c
int simple(int xp, int y)? int t xp
y xp t return t
_simple pushl ebp Setup stack frame
pointer movl esp, ebp movl
8(ebp), edx get xp movl 12(ebp),
ecx get y movl (edx), eax move xp
to t addl ecx, eax add y to t
movl eax, (edx) store t at xp
popl ebp restore frame pointer
ret return to caller
71
Function pointers
  • Pointers in C can also point to code locations
  • Function pointers
  • Store and pass references to code
  • Some uses
  • Dynamic late-binding of functions
  • Dynamically set a random number generator
  • Replace large switch statements for implementing
    dynamic event handlers
  • Example dynamically setting behavior of GUI
    buttons
  • Emulating virtual functions and polymorphism
    from OOP
  • qsort() with user-supplied callback function for
    comparison
  • man qsort
  • Operating on lists of elements
  • multiplicaiton, addition, min/max, etc.
  • Malware leverages this to execute its own code

72
Using pointers to functions
// function prototypes int doEcho(char) int
doExit(char) int doHelp(char) int
setPrompt(char) // dispatch table
section typedef int (func)(char) typedef
struct char name func function
func_t func_t func_table "echo",
doEcho , "exit", doExit , "quit",
doExit , "help", doHelp , "prompt",
setPrompt , define cntFuncs
(sizeof(func_table) / sizeof(func_table0))?
// find the function and dispatch it for (i 0
i lt cntFuncs i) if (strcmp(command,func_tab
lei.name)0) done func_tablei.functio
n(argument) break if (i cntFuncs)
printf("invalid command\n")
73
Function pointers example
main leal 4(esp), ecx andl
-16, esp pushl -4(ecx)
pushl ebp movl esp, ebp
pushl ecx subl 4, esp
movl (ecx), eax movl fp2, edx
testb 1, al jne .L4
movl fp1, edx .L4 movl eax,
(esp) call edx addl 4,
esp popl ecx popl ebp
leal -4(ecx), esp ret
  • include ltsys/time.hgt
  • include ltstdio.hgt
  • void fp1(int i) printf("Even\n,i)
  • void fp2(int i) printf("Odd\n,i)
  • main(int argc, char argv)
  • void (fp)(int)
  • int i argc
  • if (argc2)
  • fpfp2
  • else
  • fpfp1
  • fp(i)
  • mashimaro ./funcp a
  • Even 2

74
Uses in operating system
  • Interrupt descriptor table
  • Pointers to interrupt handler functions
  • IDTR points to IDT
  • System services descriptor table
  • Pointers to system call functions
  • Import address table
  • Pointers to imported library calls
  • Malware attacks all of these

75
More disassembly
  • Code patterns in assembly
  • Calling conventions (fast vs. standard vs.
    cdecl)?
  • ebp omission
  • ecx use as C this pointer
  • C vtables (virtual function table)?
  • WinXP SP2 prologue with patching support
  • For detours
  • Exception handlers (FS register)?
  • Linked list of functions stored in exception
    frames on stack

76
Advanced disassembly
  • Windows examples
  • Largely the same with small modifications
  • Size of operands (i.e. dword) specified (not in
    operator suffix)?
  • Reverse ordering of operands

77
Disassembly example
0000 mov ecx, 5 0003 push aHello 0009 call
printf 000E loop 00000003h 0014 ...
for(int i0ilt5i)? printf(Hello)
0000 cmp ecx, 100h 0003 jnz 001Bh 0009 push
aYes 000F call printf 0015 jmp 0027h 001B
push aNo 0021 call printf 0027 ...
if(x 256)? printf(Yes) else
printf(No)
78
Disassembly example
push ebp mov ebp, esp sub esp, 2A8h lea eax,
ebp0FFFFFE70h push eax push 101h call
4012BEh test eax, eax jz 401028h mov eax, 1 jmp
40116Fh push 0 push 1 push 2 call 4012B8h mov
dword ptr ebp0FFFFFE6Ch, eax cmp dword ptr
ebp0FFFFFE6Ch, byte 0FFh jnz 401047h jmp
401165h mov word ptr ebp0FFFFFE5Ch, 2 push
800h call 4012B2h mov word ptr ebp0FFFFFE5Eh,
ax push 0 call 4012ACh mov dword ptr
ebp0FFFFFE60h, eax push 10h lea ecx,
ebp0FFFFFE5Ch push ecx mov edx,
ebp0FFFFFE6Ch push edx call 4012A6h cmp eax,
byte 0FFh jnz 40108Dh jmp 401165h push 1 mov eax,
ebp0FFFFFE6Ch push eax call 4012A0h cmp eax,
byte 0FFh jnz 4010A5h jmp 401165h
  • int main(int argc, char argv)
  • WSADATA wsa
  • SOCKET s
  • struct sockaddr_in name
  • unsigned char buf256
  • // Initialize Winsock
  • if(WSAStartup(MAKEWORD(1,1),wsa))?
  • return 1
  • // Create Socket
  • s socket(AF_INET,SOCK_STREAM,0)
  • if(INVALID_SOCKET s)
  • goto Error_Cleanup
  • name.sin_family AF_INET
  • name.sin_port htons(PORT_NUMBER)

79
Tools for disassembling
  • IDA Pro, IDA Pro Free
  • Disassembler
  • Execution graph
  • Cross-referencing
  • Searching
  • Function analysis
  • Function and variable labeling

80
Tools for disassembling
  • objdump
  • objdump -d ltobject_filegt
  • Analyzes bit pattern of series of instructions
  • Produces approximate rendition of assembly code
  • Can be run on either executable or relocatable
    (.o) file
  • gdb Debugger
  • gdb p
  • disassemble sum
  • Disassemble procedure
  • x/13b sum
  • Examine the 13 bytes starting at sum

81
In-class exercise
  • Lab 5-1 (Steps 1-17)
  • Use IDA Pro to bring up the code of DllMain
  • Bring up Figures 5-1L, the equivalent of 5-2L,
    and 5-3L
  • Find the remote shell routine in which memcmp is
    used to compare command strings received over the
    network
  • Show the code for the function called if the
    command robotwork is invoked
  • Show IDA Pro graphs of DLLMain and sub_10004E79
  • Explain what the assembly code on p. 499 does
  • Find the socket call referred to in Table 5-1L
    and change its integer constants to symbolic ones
  • Show the assembly on p. 500. Find the routine
    that calls this assembly which shows that it is
    an anti-VM check.

82
In-class exercise
  • Lab 6-1
  • Show the imported network functions in any tool
  • Show the output of executing the binary
  • Load binary in IDA Pro to generate Figure 6-1L
  • Lab 6-2
  • Generate Listing 6-1L and 6-2L using a tool of
    your choice. What calls hint at this code's
    function?
  • Using either Wireshark or netcat with Apate DNS,
    execute the malware to generate Listing 6-3L
  • In IDA Pro, show the functions called by main.
    What does each one do?
  • In IDA Pro, show the order that the WinINet calls
    are used and explain what each one does.
  • Generate Listing 6-5L and explain what each cmp
    does.

83
Windows
  • Chapter 7 Analyzing Malicious Windows Programs

84
Types
  • Hungarian notation
  • word (w) 16 bit value
  • double word (dw) dword 32 bit value
  • dwSize A type that is a 32-bit value
  • Handles (H)
  • HWND A handle to a window
  • Long Pointer (LP)
  • Callback

85
File system functions
  • Malware often hits file system
  • CreateFile, ReadFile, WriteFile
  • Memory mapping calls CreateFileMapping,
    MapViewOfFile
  • Trickiness
  • Alternate Data Streams (special file data)
  • \Device\PhysicalMemory (accesses memory)
  • \\.\ (accesses device)

86
Registry functions
  • Malware often hits registry
  • Registry stores OS and program configuration
    information
  • HKEY_LOCAL_MACHINE (HKLM) Settings global to
    the machine
  • HKEY_CURRENT_USER (HKCU) Settings for current
    user
  • Regedit tool for examining values
  • Functions RegOpenKeyEx, RegSetValueEx,
    RegGetValue (Listing 7-1)

87
Networking APIs
  • Berkeley sockets API
  • socket, bind, listen, accept, connect, recv, send
  • Listing 7-3
  • ?WinINet API
  • InternetOpen, InternetOpenURL, InternetReadFile

88
DLLs
  • Dynamic link libraries
  • Store code that is re-used amongst applications
    including malware
  • Can be used to store malicious code for injection
    into a process
  • Malware uses standard Windows DLLs to interact
    with OS
  • Malware uses third-party DLLs (e.g. Firefox DLL)
    to avoid re-implementing functions

89
Processes
  • Execute code outside of current process
  • CreateProcess
  • Listing 7-4
  • Hijack execution of current process
  • Injecting code via debugger or DLLs
  • Companion execution
  • Store executable in resource section of PE
  • Program extracts executable and writes it to disk
    upon execution

90
Threads
  • Windows threads share same memory space but have
    separate registers and stack
  • Used by Malware to insert a malicious DLL into a
    process's address space
  • CreateThread with address of LoadLibrary as start
    address

91
Services
  • Processes run in the background
  • Scheduled and run by Windows service manager
    without user input
  • OpenSCManager, CreateService, StartService
  • Allows malware to maintain persistence on a
    machine
  • Types
  • WIN32_SHARE_PROCESS allows multiple processes
    to contact service (e.g. svchost.exe)
  • WIN32_OWN_PROCESS independent process
  • KERNEL_DRIVER loads code into kernel

92
COM
  • Microsoft Component Object Model
  • Interface standard that allows software
    components to call each other
  • OleInitialize, CoInitializeEx
  • CLSID class identifier, IID interface
    identifier
  • Navigate function in IWebBrowser2 interface
  • Used by malware to launch browser
  • Listing 7-11
  • Malware implemented as COM server
  • Browser helper objects
  • Detect COM servers running via its calls
  • DllCanUnloadNow, DllGetClassObject, DllInstall,
    DllRegisterServer, DllUnregisterServer

93
Exceptions
  • Allow program to handle exceptional conditions
    during program execution
  • Windows Structured Exception Handling
  • Exception handling information stored on stack
  • Listing 7-13
  • Not all handlers respond to all exceptions
  • Thrown to caller's frame if not handled
  • Used by malware to hijack execution
  • Handler address replaced by address to injected
    malicious code
  • Adversary then triggers exception

94
Kernel-mode malware
  • Windows API calls (Kernel32.dll)
  • Typically call into underlying Native API
    (Ntdll.dll)
  • Code in Ntdll then transfers to kernel
    (Ntoskrnl.exe) via INT 0x2E, SYSENTER, SYSCALL
  • Figure 7-3
  • Malware often calls Ntdll directly to avoid
    detection via interposition of security programs
    between Kernel32.dll and Ntdll.dll
  • Example Windows API (ReadFile, WriteFile) versus
    Native API (NtReadFile, NtWriteFile)
  • Figure 7-4

95
Kernel-mode malware
  • Other Native API calls
  • NtQuerySystemInformation, NtQueryInformationProces
    s, NtQueryInformationThread, NtQueryInformationFil
    e, NtQueryInformationKey
  • Can also carry Zw prefix
  • NtContinue
  • Used to return from an exception
  • Location to return is specified in exception
    context, but can be modified to transfer
    execution in nefarious ways

96
Kernel-mode malware
  • Legitimate programs typically do not use
    NativeAPI exclusively
  • Programs that are native applications (as
    specified in subsytem part of PE header) are
    likely malicious

97
In-class exercise
  • Lab 7-2
  • Using strings, identify the network resource
    being used by the malware
  • What imports give away the mechanism this malware
    uses to launch the browser?
  • Go to the code snippet shown on p. 518. Follow
    the references to show the values of rclsid and
    riid in memory.
  • Debug the program and break at the call shown on
    p. 519. Run the call to show the browser being
    launched with the embedded URL

98
Extra
99
Run-time data structures
100
More code snippets
  • Registry modifications for disabling task manager
    and changing browser default page

HKEY_CURRENT_USER\Software\Policies\Microsoft\Inte
rnet Explorer\Control Panel,Homepage HKEY_CURRENT_
USER\Software\Microsoft\Windows\CurrentVersion\Pol
icies\SystemDisableRegistryTools HKEY_CURRENT_USER
\Software\Microsoft\Internet Explorer\MainStart
Page HKEY_CURRENT_USER\Software\Yahoo\pager\View\Y
MSGR_buzz content url HKEY_CURRENT_USER\Software\Y
ahoo\pager\View\YMSGR_Launchcast DisableTaskMgr
101
More code snippets
  • Kills anti-virus, zone-alarm, firewall processes

102
More code snippets
  • New variants
  • Download worm update files and register them as
    services
  • regsvr32 MSINET.OCX
  • Internet Transfer ActiveX Control
  • Check for updates
Write a Comment
User Comments (0)
About PowerShow.com