Title: Post-Attack Analysis of Unknown Vulnerabilities
1Post-Attack Analysis of Unknown Vulnerabilities
- Peng Ning
- With Emre C. Sezer, Chongkyung Kil, and Jun Xu
2Motivation
- Vulnerability analysis
- Essential for
- Patching
- Vulnerability based signature generation
- Painstakingly slow
- Depends on human efforts
- Existing approaches
- Static analysis (e.g., Chen et al. 04 , Feng
et al. 04, Larochelle Evans 01) - False positives
- Dynamic analysis (e.g., Minos Crandall et al.
04, TaintCheck Newsome Song 05, DIRA
Smirnov Chiueh 05) - Used for detection inadequate vulnerability
information - Symbolic execution (e.g., Exe Cadar et al. 06,
DACODA Crandall et al. 05) - Scalability issues
- Recovery (e.g., STEM Sidiroglou et al. 05, SEAD
Lacosto et al. 07) - Change of application semantics
3MemSherlock
- MemSherlock is an automated debugger
- Automated analysis of unknown memory corruption
vulnerabilities - Appeared in ACM CCS 07
- MemSherlock provides
- Statement that causes the memory corruption
- Dynamic program slice leading to the corruption
- Program variables involved in the vulnerability
- All presented at programming language level
- Implications
- Generating vulnerability conditions
- Improves signature or patch generation speed
4General Framework Web Application Example
Traffic
5MemSherlock Overview
- Goal is to provide vulnerability information
- Intuitive, easy to understand for the programmer
- Not only the corruption point
- Slice of program involved in the vulnerability
- Effects of user inputs
- Program variables involved
- Variable relationships (e.g., pointer aliasing)
- Type of vulnerability (e.g., stack buffer
overflow) - MemSherlock performs two important tasks
- Finding the corruption point
- Tracking program state
6MemSherlock Finding Corruption Point
- Observation A memory object is modified by a
small set of statements (inspired by AccMon) - For memory object m, write set of m is the set of
statements that legitimately modify m, WS(m) - Security Condition Memory object m should only
be updated by statements in WS(m)
7MemSherlock Assembly Line
- Pre-Debugging Phase
- Instruments the program for debugging phase
- Extracts program information via static analysis
- Needs to be performed once
- Debugging Phase
- Tracks program state
- Monitors memory writes and checks for violation
of security condition - Tracks tainted data and its propagation
8MemSherlock Architecture
9Pre-debugging Generating Write Sets
- MemSherlock analyses source code to determine
write sets - For a program variable v, WS(v) includes
- Assignment statements (i.e., vexpr)
- Library function calls where v is passed as an
argument that can be modified (i.e.,
memcpy(v,src)) - MemSherlock treats DLLs as black boxes
- Assumption A DLL is internally secure, but
externally insecure - e.g., no stack overflows in the library functions
- Sound for common, well tested libraries (e.g.,
clib) - Requires library specifications
- For each DLL, a list of functions and the
arguments they might modify
10Dealing with Pointers
- For a pointer variable p two write sets are kept
- WS(p) Statements that modify p
- WS(ref(p)) Statements that modify the referent
(e.g., p5) - ref(p) is resolved during runtime (debugging)
- Perform the same analysis for pointer-type
function arguments at function calls - Removes the requirement for inter-procedural
static analysis
11Chained Dereferences
1 int z 2 int y z 3 int x y 4 x 10 1 int z 2 int y z 3 int x y 4 int temp x 5 temp 10
- Earlier technique can only handle simple
dereferences - Source code rewriting is used to convert all
chained dereferences to simple dereferences - Any other dereference that is not simple is
converted in the same manner
12Output of Pre-debugging Phase
- Simplified program
- Simplified pointer dereferences
- Compiled with debugging options
- Input file for the debugger
- Program variables and their write sets
- Addresses of global symbols
- Frame pointer offsets of local variables
- Other flags that help the debugger
13MemSherlock Architecture Debugging
14Debugging Dynamic Monitoring
- Runtime monitoring
- State Maintenance
- Incorporates taint analysis from TaintCheck
- Produces a dynamic slice of the program leading
to the vulnerability - Write Checking
- Monitors and validates memory writes
- Write sets are file name and line number pairs
ltf,lgt - Instruction pointer IP is translated into ltf,lgt
- Write sets are associated with program variables
- A destination address is translated into a
program variable
15Keeping Program State
Virtual Address Space
Stack base
Stack base
main
main
fnc A
fnc A
fnc B
fnc C
Memory write 0xABABABAB
Memory write 0xABABABAB
Program State 1
Program State 2
- A given memory region may correspond to different
program variables depending on program state - Dynamic monitor keeps track of memory mapping
16Debugging Key Data Structures
- Keeps two lists of memory regions
- ActiveMemoryRegions
- Memory corresponding to program variables or
their referent memory regions - NonWritableRegions
- Saved registers, return addresses, metadata
encapsulating dynamically allocated memory
regions
17Debugging State Maintenance
- Function calls/returns (memory)
- Local variable addresses are calculated and added
to ActiveMemoryRegions - Location of return address and saved registers
are added to NonWritableRegions list - Heap memory (memory)
- malloc/free calls are intercepted
- Allocated memory is added to ActiveMemoryRegions
- The metadata encapsulating the buffer is added to
NonWritableRegions - Pointer value updates (write sets)
- Searches ActiveMemoryRegions to find the referent
and updates its WS
18Debugging Write Checking
- When instruction IP modifies memory m
- if m is in ActiveMemoryRegions
- determines the variable v it belongs to
- converts IP into ltf,lgt
- checks if ltf,lgt is in WS(v)
- If the memory write check fails or m is in
NonWritableRegions - Marks the operation as a memory corruption
- Displays the vulnerability information
19Generating Vulnerability Information
- The slice of program contributing to the
vulnerability - Statements that have propagated tainted values
- Statements that have modified related memory
regions - Dependency between memory objects involved in the
vulnerability - Points to analysis shows memory regions and how
they were accessed - Program state
- Call stack information
- Write set information
20Example Test Case Null HTTP
- http.c
- 91 void ReadPOSTData(int sid)
-
- 100 connsid.PostDatacalloc(connsid.dat-gtin_C
ontentLength1024, sizeof(char)) - 101 if (connsid.PostDataNULL) ...
- 107 do
- 108 rcrecv(connsid.socket, pPostData, 1024,
0) - 109
- --20361-- Error type Heap Buffer Overflow
- --20361-- Dest Addr 3AB3E360
- --20361-- IP 0x804E5C7 ReadPOSTData
(http.c108) - --20361-- Dest address resolved to
- --20361-- Global variable "heap var"
- _at_ 3AB3E280 (size 224)
- --20361--
- --20361-- Memory allocated by 0x804E531
- ReadPOSTData (http.c100)
- --20361-- TAINTED destination 3AB3E360
- --20361-- Fully tainted from
- --20361-- 0x804E5C7 ReadPOSTData
(http.c108) - --20361--
- --20361-- TAINTED size used during allocation
- --20361-- Tainted from
- --20361-- 0x804E456 ReadPOSTData
(http.c100) - --20361-- 0x804FBB5 read_header (http.c153)
- --20361-- 0x805121B sgets (server.c211)
21Vulnerability Analysis Example
http.c 91 void ReadPOSTData(int sid) 92
char pPostData ... 100 connsid.PostDatacal
loc( connsid.dat-gtin_ContentLength1024,
sizeof(char)) ... 107 do 108
rcrecv(connsid.socket, pPostData, 1024,
0) ...
Create
Heap Object
22Vulnerability Analysis Example
http.c 119 int read_header(int sid) 121
char line2048 ... 127 do 128
memset(line, 0, sizeof(line)) 129 sgets(line,
sizeof(line)-1, connsid.socket)
... 153 connsid.dat-gtin_ContentLengthato
i((char )line16) ... 169 if
(connsid.dat-gtin_ContentLengthltMAX_POSTSIZE)
170 ReadPOSTData(sid)
Object
Taint
http.c 91 void ReadPOSTData(int sid) 92
char pPostData ... 100 connsid.PostDatacal
loc( connsid.dat-gtin_ContentLength1024,
sizeof(char)) ... 107 do 108
rcrecv(connsid.socket, pPostData, 1024,
0) ...
Object
Use
23Vulnerability Analysis Example
http.c 119 int read_header(int sid) 121
char line2048 ... 127 do 128
memset(line, 0, sizeof(line)) 129 sgets(line,
sizeof(line)-1, connsid.socket)
... 153 connsid.dat-gtin_ContentLengthato
i((char )line16) ... 169 if
(connsid.dat-gtin_ContentLengthltMAX_POSTSIZE)
170 ReadPOSTData(sid)
Create
server.c 202 int sgets(char buffer, int
max, int fd) 203 ... 209
connsid.atimetime((time_t)0) 210 while
(nltmax) 211 if ((rcrecv(connsid.socke
t, buffer, 1, 0))lt0) ...
Taint
Object
Taint
Object
24Implementation
- Source code is rewritten using CIL (C
Intermediate Language) - CodeSurfer was used to extract program variables
and their write sets - A commercial static analysis tool
- objdump and dwarfdump were used to extract global
symbol information - Dynamic Monitoring is implemented in Valgrind
- An open source emulator
25Evaluation
- Tested 11 real-world applications with known
memory corruption vulnerabilities - Test cases included
- Stack/Heap buffer overflow, Format string
- Both control flow and non-control data attacks
- Testing methodology
- Programs were run under MemSherlock
- Exploit programs were used to attack the
applications - Log and replay was not used
26Evaluation Results
Application Name Vuln.Type Description Captured? FP
GHTTP S A small HTTP server Yes 7
Icecast S An mp3 broadcast server Yes 0
Sumus S A game server for mus Yes 0
Monit S Multi-purpose anomaly detector Yes 0
Newspost S Automatic news posting Yes 2
Prozilla S A download accelerator for Linux No 0
NullHTTP H An HTTP server Yes 0
Xtelnet H A telnet server Yes 4
Wsmp3 H Web server with mp3 broadcasting Yes 0
OpenVMPS F Open source VLan management policy server Yes 2
Power F UPS monitoring utility Yes 10
- Type abbreviations (S)tack overflow, (H)eap
overflow and (F)ormat string
27False Negatives
- Prozilla
- memcpy uses a kernel function to manipulate page
tables when copying entire pages - Valgrind cannot trace into kernel
- Can be prevented by function wrappers
- Other false negatives are theoretically possible
- structs within unions or arrays
- Current implementation does not support unions
- Currently do not differentiate between elements
of an array - Memory corruption errors inside DLLs
28False Positives
- Embedded assembly
- Incomplete library specification
- library functions keeping internal state (e.g.,
strtok(Null, delim) ) - library functions that modify global variables as
side effects (e.g., optarg, errno) - pointers that point to hidden global structures
(e.g., getdatetime() in time.h) - struct pointers
- void pointers that are type-cast to modify struct
variables - since the pointer is not of type struct,
MemSherlock fails to update accordingly
29Conclusion
- Fully automated vulnerability analysis
- The analysis output is intuitive and human
readable - Future Challenges
- Automated, long-term fix of vulnerabilities
- Semantic consistency is a great challenge
- Automated, temporary fix of vulnerabilities
- Generating vulnerability condition
- Improving signature generation
30Thank You