Title: Characterizing and Reasoning about Security Vulnerabilities
1Characterizing and Reasoning about Security
Vulnerabilities
- Shuo Chen
- Center for Reliable and High-Performance
Computing - Coordinated Science Laboratory
- University of Illinois at Urbana-Champaign
- Preliminary Examination, May 4th, 2004
- Committee Chair Prof. Ravishankar K. Iyer
- Committee Prof. Vikram Adve
Prof. Jose Meseguer Prof.
David Nicol
2Significance of Software Implementation Errors
- Bugtraq 70 of security vulnerabilities due to
implementation errors.
3What I Have Done
- Analyzed CERT and Bugtraq reports and the
corresponding application source code. - Developed a new FSM representation to decompose
each security vulnerability to a series of
elementary activities (primitive FSMs), each
indicating a simple predicate. - The FSM analysis showed
- Many vulnerabilities (? 66) due to pointer
taintedness user input value used as a pointer
value (which should be transparent to users). - A significant portion of vulnerabilities (?
33.6) due to errors in library functions or
incorrect invocations of library functions
- The FSM modeling led to a formal reasoning
approach to examine pointer taintedness in
applications.
4Formal Analysis of Pointer Taintedness
- Pointer Taintedness a pointer value, including a
return address, is derived directly or indirectly
from user input. (formally defined using
equational logic) - Provides a unifying perspective for reasoning
about a significant number of security
vulnerabilities. - The notion of pointer taintedness enables
- Static analysis reasoning about the possibility
of pointer taintedness by source code analysis - Runtime checking inserting assertions in object
code to check pointer taintedness at runtime - Hardware architecture-based support to detect
pointer taintedness. - Current focus extraction of security
specifications of library functions based on
pointer taintedness semantics.
5Publications of My Research
- Papers
- J. Xu, S. Chen, Z. Kalbarczyk, R. K. Iyer. "An
Experimental Study of Security Vulnerabilities
Caused by Errors". DSN 2001. - S. Chen, J. Xu, R. K. Iyer, K. Whisnant.
"Modeling and Analyzing the Security Threat of
Firewall Data Corruption Caused by Instruction
Transient Errors". DSN 2002. - S. Chen, Z. Kalbarczyk, J. Xu, R. K. Iyer. "A
Data-Driven Finite State Machine Model for
Analyzing Security Vulnerabilities". DSN 2003. - S. Chen, K. Pattabiraman, Z. Kalbarczyk, R. K.
Iyer, Formal Reasoning of Various Categories of
Widely Exploited Security Vulnerabilities Using
Pointer Taintedness Semantics, IFIP Information
Security Conference, 2004. - Security Vulnerability Report
- S. Chen and J. Xu, Bugtraq ID 6255 NULL HTTPD
Heap Corruption Vulnerability, the Bugtraq List.
6A Finite State Machine Approach for Analyzing
Security Vulnerabilities
7Overview of the Study
- An analysis of security vulnerability databases
(CERT and Bugtraq) - Examination of security vulnerabilities at the
application source-code level - A security vulnerability usually consists of a
series of vulnerabilities in multiple elementary
activities. Each can be represented by a
primitive FSM, indicating a simple predicate. - Provide formalism in reasoning and describing
security vulnerabilities. - Usefulness of the formalism discovery of the
HTTP daemon heap overflow vulnerability.
8Observation from Data Analysis
Same vulnerabilities can be classified in
different categories. Why? Because of the
existence of multiple elementary activities.
9Primitive FSM
- We use Primitive FSM (pFSM) to depict an
elementary activity, which specifies a predicate
(SPEC) that should be guaranteed in order to
ensure security.
IMPL_REJECT
SPEC_REJECT
IMPL_ACCEPT
SPEC_ACCEPT
10NULL HTTPD Heap Corruption Vulnerabilities
(Bugtraq 5774, 6255)
11Op 1 Read User Data from a Socket to a Heap
Buffer
0 Get contentLen //Negative ?? 1 PostData
calloc(contentLen 1024,
sizeof(char))x0 rc0 2 pPostData
PostData 3 do 4 rcrecv(sock,
pPostData, 1024, 0) 5 if (rc-1) 6
closeconnect(sid,1) 7
return 8 9 pPostDatarc 10
xrc 11 while ((rc1024) (xltcontentLen))
contentLenlt0
pFSM1
get (contentLen, input) contentLen is an
integer, input string to be read from a socket
contentLengt0
Calloc PostData1024contentLen
?
length(input)gtSize(PostData)
pFSM2
Copy input from the socket to PostData by recv()
call
length(input) lt Size(PostData)
12Sendmail Debugging Function Signed Integer
Overflow (Bugtraq 3163)
Operation 1 Write integer i to tTvectx
?
( integer represented by str_x) gt 231
x gt 100
pFSM1
get text strings str_x and str_i
x lt 0 or x gt 100
( integer represented by str_x) ? 231
convert str_i and str_x to integer i and x
x ? 100
pFSM2
0 ? x ? 100
tTvectxi
Function pointer is tainted
Operation 2 Manipulate the function pointer
?
addr_setuid changed
Load the function pointer
pFSM3
addr_setuid unchanged
Execute code referred by addr_setuid
Execute malicious code
13Modeled Vulnerabilities
- Signed Integer Overflow
- Heap Corruption
- Stack Overflow
- Format String Vulnerabilities
- File Race Conditions
- Some Input Validation Vulnerabilities
14Formal Reasoning of Security Vulnerabilities by
Pointer Taintedness Semantics
15Pointer Taintedness Caused Vulnerabilities
- Format string vulnerability
- Taint an argument pointer of functions such as
printf, fprintf, sprintf and syslog. - Stack smashing
- Taint a return address.
- Heap corruption
- Taint the free-chunk doubly-linked list of the
heap. - Glibc globbing vulnerabilities
- User input resides in a location that is used as
a pointer by the parent function of glob().
16Example of Format String Vulnerability
Vulnerable code recv(buf) printf(buf) /
should be printf(s,buf) /
\xdd \xcc \xbb \xaa d d d n
High
n d d d 0xaabbccdd
Stack growth
Low
In vfprintf(), if (fmt points to n) then
ap (character count)
ap is a tainted value.
17Taintedness Semantics (Memory Model)
- A store represents a snapshot of the memory
state at a point in the program execution. - For each memory location, we can evaluate two
properties content and taintedness (true/false). - Operations on memory locations
- The fetch operation Ftch(S,A) gives the content
of the memory address A in store S - The location-taintedness operation LocT(S,A)
gives the taintedness of the location A in store
S - Operations on expressions
- The evaluation operation Eval(S,E) evaluates
expression E in store S - The expression-taintedness operation ExpT(S,E)
computes the taintedness of expression E in store
S
18Axioms of Eval and ExpT operations
Eval(S, I) I // I
is an integer constant Eval(S, E1)
Ftch(S, Eval(S,E1)) Eval(S, E1 E2) Eval(S,
E1) Eval(S, E2) Eval(S, E1 - E2) Eval(S,
E1) - Eval(S, E2) ExpT (S, I)
false ExpT(S, E1) LocT(S,Eval(S,E1))
ExpT(S,E1 E2) ExpT(S,E1) or
ExpT((S,E2) ExpT(S,E1 - E2) ExpT(S,E1) or
ExpT((S,E2) E.g., is the expression (100)2
tainted? ExpT(S, (100)2) ExpT(S, (100)) or
ExpT(S, 2)
LocT(S,100) or false LocT(S,100) Note is
the dereference operator, 100 gives the content
in the location 100
19Semantics of Language L
- Extend the semantics proposed by Goguen and
Malcolm - The following operations (arithmetic/logic) are
defined - , -, , /, , !, , , !, ,
- The following instructions are defined
- mov Exp1 lt- Exp2
- branch (Condition) Label
- call FuncName(Exp1,Exp2,)
- Axioms defining mov instruction semantics
- Specify the effects of applying mov instruction
on a store - Allow taintedness to propagate from Exp2 to
Exp1. - Axioms defining the semantics of recv (similarly,
scanf, recvfrom) - Specify the memory locations tainted by the recv
call.
20Extracting Function Specifications by Theorem
Prover
Automatically translated to Language L
C source code of a library function
Code in language L
Theorem generation
Critical instruction indirect writes For each
mov E1 lt- E2, generate theorems a) E1 should
not be tainted b) The mov instruction should not
taint any location outside the buffer pointed by
E1
ITP theorem prover
A set of sufficient conditions that imply the
validity of the theorems. They are the security
specifications of the analyzed function.
21Example strcpy()
char strcpy (char dst,
char src) char res 0 res dst while
(src!0) 1 dstsrc dst
src 2 dst0 return res
0 mov res lt- dst lbl(while6)
branch ( src is 0) exwhile6 1 mov
dst lt- src mov dst lt- ( dst) 1
mov src lt- ( src) 1 branch true
while6 lbl(exwhile6) 2 mov dst lt- 0
mov ret lt- res
Translate to Language L
Theorem generation
a) Suppose S1 is the store before Line L1, then
LocT(S1,dst) false b) If S0 is the store
before Line L0, and S2 is the store after Line
L1, then I lt Eval(S0, dst) or Eval(S0,
dstdstsize) ? I gt
LocT(S2,I) LocT(S0, I) c)
Suppose S3 is the store before Line L2, then
LocT(S3,dst) false
Theorem prover
22Specifications Suggested by Theorem Prover
- Suppose when function strcpy() is called, the
size of destination buffer (dst) is dstsize, the
length of user input string (src) is srclen
- Specifications that are extracted by the theorem
proving approach - srclen lt dstsize
- The buffers src and dst do not overlap in such a
way that the buffer dst covers the
NULL-terminator of the src string. - The buffers dst and src do not cover the function
frame of strcpy. - Initially, dst is not tainted
Documented in Linux man page
Not documented
23Example Scenario
Are the extracted specifications possible to be
violated in application code?
index
Destination buffer should not cover the function
frame of strcpy. char input240 void foo( )
int offset char buf200 scanf(s,
input ) offset 200 strlen( input )
strcpy( buf offset , input )
High
buf
bufoffset
foo
buf
Return Addr.
Stack growth
Frame Pointer
src
strcpy
dst
res
Low
24Other Examples
- A simplied version of printf()
- 55 lines of C code
- Four security specifications are extracted,
including one indicating format string
vulnerability - Function free() of a heap management system
- 36 lines of C code
- Seven security specifications are extracted,
including several specifications indicating heap
corruption vulnerabilities. - Socket read functions of Apache HTTPD and NULL
HTTPD - The Apache function is proved to be free of
pointer taintedness. - Two (known) vulnerabilities are exposed in the
theorem proving process.
25Summary
- FSM representation decompose each vulnerability
to multiple simple predicates (with real
vulnerability examples) - A common characteristic of many predicates their
violations result in pointer taintedness - Defined a memory model to reason about pointer
taintedness - Developed a theorem proving approach to extract
security specifications from library functions
26Future Directions
- Develop a VCGen (verification condition
generator) to facilitate theorem proving. (in
progress) - Apply the pointer taintedness analysis to a
substantial number of commonly used library
functions to extract their security
specifications. - Compiler techniques for inserting guarding code
to check unproved properties at runtime. - Explore the possibility of building the
taintedness notion into virtual machines. - Architecture supports for pointer taintedness
detection. A module working with RSE (Reliability
and Security Engine).
27Backup Slides
28Format String Vulnerability
int vfprintf (FILE s, const char format,
va_list ap) char p (int )
va_arg (ap, void ) count int printf
(const char format, ...) count
vfprintf (stdout, format, arg) int
i,j int main() char buf100 (unsigned int
)bufi (buf4)0 strcat(buf,"ddd12345
n") printf(buf)
buf \x78 \x99 \x04 \x08 d d d 1 2 3
4 5 n
the addr of i
29Elementary Activity 1 of Sendmail Vulnerability
Elementary Activity 1 get user input Get
strings str_x and str_i, convert them to integers
x and i
a
?
(integer represented by str_x) gt 231
pFSM
Get str_x and str_i
1
(integer represented by str_x) ? 231
Convert str_x and str_i to integers x and i
30Elementary Activity 2 of Sendmail Vulnerability
Elementary Activity 2 assign debug level
x gt100
xlt0 or xgt100
Convert str_x and str_i to integers x and i
pFSM2
x ?100
0?x ?100
tTvectxi
A function pointer (psetuid) is corrupted
31Elementary Activity 3 of Sendmail Vulnerability
Elementary Activity 3 manipulation of function
pointer psetuid
A function pointer (psetuid) is corrupted
?
psetuid is changed
Load psetuid to the memory
pFSM
3
starting sendmail program
Execute the code referred by psetuid
psetuid is unchanged
Execute malicious code
32Appropriateness of Dereference
- A data value x is appropriate to be dereferenced
if and only if one of the following condition is
true, assuming Y,Z are integer constants - x is foo (foo is a program variable)
- x is malloc(Y)
- If there exist values a, b and c that are
appropriate to dereference, (recursive
definition) and x a b c Z - Theorems to prove for indirect write mov E1
lt- E2 - E1 should be appropriate to dereference
- If E2 is not appropriate to dereference, then
E1 should not be appropriate to dereference.
33About Equational Logic
A logic defined by equations. Equations are used
to rewrite symbolic terms (by replacing the term
on the left of the equation with the term on the
right of the term). Emphasize on its
executability. Define the natural number
(NAT) Operators 0 a constant of NAT
s_ NAT -gt NAT (successor operator)
__ NAT NAT -gt NAT (addition
operator) Equations 0 N N (s M) N
M (s N) Example (s s s 0) (s s 0)
(s s 0) (s s s 0) (s 0) (s s s s 0)
0 (s s s s s 0) s s
s s s 0 Intuitively, this represents 3 2
5
34Semantics of mov and recv
- Axioms of mov instruction
- Ftch((S mov E1 lt- E2),X) Eval(S,E2) if
(Eval(S,E1) is X) . - Ftch((S mov E1 lt- E2),X) Ftch(S,X) if
not (Eval(S,E1) is X) . - LocT((S mov E1 lt- E2),X) ExpT(S,E2) if
(Eval(S,E1) is X) . - LocT((S mov E1 lt- E2),X) LocT(S,X) if
not (Eval(S,E1) is X) . - Semantics of recv (similarly, scanf, recvfrom)
- LocT(S call recv (sock , buf , len, flag), A)
true if Eval(S,buf) lt A and A lt Eval(S,
buf len) . - LocT(S call recv (sock , buf , len, flag), A)
LocT(S, A) otherwise .
35Related Work
- Security Modeling
- Sheyner and Wing Attack graphs
- Ortalo and Deswarte Markov models
- Static code analysis
- Buffer overflow detection Wagner, many others
- Format string detection CQUAL, SPLINT
- Assembly code verification Proof-Carrying Code
- Generic (annotation based) SPLINT, Eau Claire
- Taintedness analysis
- Perl runtime
- CQUAL and SPLINT taintedness of program
variables. - A symbol gets tainted only if an explicit C
statement passes a tainted value to it by
assignment, argument passing or function return.
No underlying memory model. - Not sufficient to detect real pointer taintedness
vulnerabilities.
36Position My Work
Application Code
Library Functions
Existing static analysis tools
Security Specs
My work
e.g., src_len lt dst_size (strcpy) src
and dst do not overlap (strcpy) Do not free
a stack buffer Do not double free a buffer First
argument of printf cannot come from user
37Presentation Outline
- A Brief Description of FSM Approach of Modeling
and Analyzing Security Vulnerabilities - Real Examples of Pointer Taintedness
- Definition of Pointer Taintedness in Equational
Logic - Extraction of Function Specifications by Theorem
Proving - Summary and Future Directions
38Extraction of Security Specs of Library Functions
using Pointer Taintedness
- A formal approach to reason about potential
vulnerabilities in library source code. - Reasoning based on a hypothetical memory model a
boolean property taintedness associated with each
memory location. - The semantics of pointer taintedness defined in
equational logic. - A theorem prover employed to extract security
specifications of library functions. - Security specifications extracted by the
analysis - expose different classes of known security
vulnerabilities, such as format string, heap
corruption and buffer overflow vulnerabilities - indicate function invocation scenarios that may
expose new vulnerabilities.
39Observations from Data Analysis (cont.)
- Exploiting a vulnerability involves multiple
vulnerable operations on several objects. - Exploits must pass through multiple elementary
activities, each providing an opportunity for
performing a security check. - For each elementary activity, the vulnerability
data and corresponding code inspections allow us
to define a predicate, which if violated, will
result in a security vulnerability.