Characterizing and Reasoning about Security Vulnerabilities - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Characterizing and Reasoning about Security Vulnerabilities

Description:

Use the integer as the index to an array. Boundary condition error ... Sendmail Debugging Function Signed Integer Overflow (Bugtraq #3163) Operation 1: ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 40
Provided by: shuo
Category:

less

Transcript and Presenter's Notes

Title: Characterizing and Reasoning about Security Vulnerabilities


1
Characterizing and Reasoning about Security
Vulnerabilities
  • Shuo Chen
  • Center for Reliable and High-Performance
    Computing
  • Coordinated Science Laboratory
  • University of Illinois at Urbana-Champaign
  • Preliminary Examination, May 4th, 2004
  • Committee Chair Prof. Ravishankar K. Iyer
  • Committee Prof. Vikram Adve
    Prof. Jose Meseguer Prof.
    David Nicol

2
Significance of Software Implementation Errors
  • Bugtraq 70 of security vulnerabilities due to
    implementation errors.

3
What I Have Done
  • Analyzed CERT and Bugtraq reports and the
    corresponding application source code.
  • Developed a new FSM representation to decompose
    each security vulnerability to a series of
    elementary activities (primitive FSMs), each
    indicating a simple predicate.
  • The FSM analysis showed
  • Many vulnerabilities (? 66) due to pointer
    taintedness user input value used as a pointer
    value (which should be transparent to users).
  • A significant portion of vulnerabilities (?
    33.6) due to errors in library functions or
    incorrect invocations of library functions
  • The FSM modeling led to a formal reasoning
    approach to examine pointer taintedness in
    applications.

4
Formal Analysis of Pointer Taintedness
  • Pointer Taintedness a pointer value, including a
    return address, is derived directly or indirectly
    from user input. (formally defined using
    equational logic)
  • Provides a unifying perspective for reasoning
    about a significant number of security
    vulnerabilities.
  • The notion of pointer taintedness enables
  • Static analysis reasoning about the possibility
    of pointer taintedness by source code analysis
  • Runtime checking inserting assertions in object
    code to check pointer taintedness at runtime
  • Hardware architecture-based support to detect
    pointer taintedness.
  • Current focus extraction of security
    specifications of library functions based on
    pointer taintedness semantics.

5
Publications of My Research
  • Papers
  • J. Xu, S. Chen, Z. Kalbarczyk, R. K. Iyer. "An
    Experimental Study of Security Vulnerabilities
    Caused by Errors". DSN 2001.
  • S. Chen, J. Xu, R. K. Iyer, K. Whisnant.
    "Modeling and Analyzing the Security Threat of
    Firewall Data Corruption Caused by Instruction
    Transient Errors". DSN 2002.
  • S. Chen, Z. Kalbarczyk, J. Xu, R. K. Iyer. "A
    Data-Driven Finite State Machine Model for
    Analyzing Security Vulnerabilities". DSN 2003.
  • S. Chen, K. Pattabiraman, Z. Kalbarczyk, R. K.
    Iyer, Formal Reasoning of Various Categories of
    Widely Exploited Security Vulnerabilities Using
    Pointer Taintedness Semantics, IFIP Information
    Security Conference, 2004.
  • Security Vulnerability Report
  • S. Chen and J. Xu, Bugtraq ID 6255 NULL HTTPD
    Heap Corruption Vulnerability, the Bugtraq List.

6
A Finite State Machine Approach for Analyzing
Security Vulnerabilities
7
Overview of the Study
  • An analysis of security vulnerability databases
    (CERT and Bugtraq)
  • Examination of security vulnerabilities at the
    application source-code level
  • A security vulnerability usually consists of a
    series of vulnerabilities in multiple elementary
    activities. Each can be represented by a
    primitive FSM, indicating a simple predicate.
  • Provide formalism in reasoning and describing
    security vulnerabilities.
  • Usefulness of the formalism discovery of the
    HTTP daemon heap overflow vulnerability.

8
Observation from Data Analysis
Same vulnerabilities can be classified in
different categories. Why? Because of the
existence of multiple elementary activities.
9
Primitive FSM
  • We use Primitive FSM (pFSM) to depict an
    elementary activity, which specifies a predicate
    (SPEC) that should be guaranteed in order to
    ensure security.

IMPL_REJECT
SPEC_REJECT
IMPL_ACCEPT
SPEC_ACCEPT
10
NULL HTTPD Heap Corruption Vulnerabilities
(Bugtraq 5774, 6255)
11
Op 1 Read User Data from a Socket to a Heap
Buffer
0 Get contentLen //Negative ?? 1 PostData
calloc(contentLen 1024,
sizeof(char))x0 rc0 2 pPostData
PostData 3 do 4 rcrecv(sock,
pPostData, 1024, 0) 5 if (rc-1) 6
closeconnect(sid,1) 7
return 8 9 pPostDatarc 10
xrc 11 while ((rc1024) (xltcontentLen))
contentLenlt0
pFSM1
get (contentLen, input) contentLen is an
integer, input string to be read from a socket
contentLengt0
Calloc PostData1024contentLen
?
length(input)gtSize(PostData)
pFSM2
Copy input from the socket to PostData by recv()
call
length(input) lt Size(PostData)
12
Sendmail Debugging Function Signed Integer
Overflow (Bugtraq 3163)
Operation 1 Write integer i to tTvectx
?
( integer represented by str_x) gt 231
x gt 100
pFSM1
get text strings str_x and str_i
x lt 0 or x gt 100
( integer represented by str_x) ? 231
convert str_i and str_x to integer i and x
x ? 100
pFSM2
0 ? x ? 100
tTvectxi
Function pointer is tainted
Operation 2 Manipulate the function pointer
?
addr_setuid changed
Load the function pointer
pFSM3
addr_setuid unchanged
Execute code referred by addr_setuid
Execute malicious code
13
Modeled Vulnerabilities
  • Signed Integer Overflow
  • Heap Corruption
  • Stack Overflow
  • Format String Vulnerabilities
  • File Race Conditions
  • Some Input Validation Vulnerabilities

14
Formal Reasoning of Security Vulnerabilities by
Pointer Taintedness Semantics
15
Pointer Taintedness Caused Vulnerabilities
  • Format string vulnerability
  • Taint an argument pointer of functions such as
    printf, fprintf, sprintf and syslog.
  • Stack smashing
  • Taint a return address.
  • Heap corruption
  • Taint the free-chunk doubly-linked list of the
    heap.
  • Glibc globbing vulnerabilities
  • User input resides in a location that is used as
    a pointer by the parent function of glob().

16
Example of Format String Vulnerability
Vulnerable code recv(buf) printf(buf) /
should be printf(s,buf) /
\xdd \xcc \xbb \xaa d d d n
High
n d d d 0xaabbccdd
Stack growth
Low
In vfprintf(), if (fmt points to n) then
ap (character count)
ap is a tainted value.
17
Taintedness Semantics (Memory Model)
  • A store represents a snapshot of the memory
    state at a point in the program execution.
  • For each memory location, we can evaluate two
    properties content and taintedness (true/false).
  • Operations on memory locations
  • The fetch operation Ftch(S,A) gives the content
    of the memory address A in store S
  • The location-taintedness operation LocT(S,A)
    gives the taintedness of the location A in store
    S
  • Operations on expressions
  • The evaluation operation Eval(S,E) evaluates
    expression E in store S
  • The expression-taintedness operation ExpT(S,E)
    computes the taintedness of expression E in store
    S

18
Axioms of Eval and ExpT operations
Eval(S, I) I // I
is an integer constant Eval(S, E1)
Ftch(S, Eval(S,E1)) Eval(S, E1 E2) Eval(S,
E1) Eval(S, E2) Eval(S, E1 - E2) Eval(S,
E1) - Eval(S, E2) ExpT (S, I)
false ExpT(S, E1) LocT(S,Eval(S,E1))
ExpT(S,E1 E2) ExpT(S,E1) or
ExpT((S,E2) ExpT(S,E1 - E2) ExpT(S,E1) or
ExpT((S,E2) E.g., is the expression (100)2
tainted? ExpT(S, (100)2) ExpT(S, (100)) or
ExpT(S, 2)
LocT(S,100) or false LocT(S,100) Note is
the dereference operator, 100 gives the content
in the location 100
19
Semantics of Language L
  • Extend the semantics proposed by Goguen and
    Malcolm
  • The following operations (arithmetic/logic) are
    defined
  • , -, , /, , !, , , !, ,
  • The following instructions are defined
  • mov Exp1 lt- Exp2
  • branch (Condition) Label
  • call FuncName(Exp1,Exp2,)
  • Axioms defining mov instruction semantics
  • Specify the effects of applying mov instruction
    on a store
  • Allow taintedness to propagate from Exp2 to
    Exp1.
  • Axioms defining the semantics of recv (similarly,
    scanf, recvfrom)
  • Specify the memory locations tainted by the recv
    call.

20
Extracting Function Specifications by Theorem
Prover
Automatically translated to Language L
C source code of a library function
Code in language L
Theorem generation
Critical instruction indirect writes For each
mov E1 lt- E2, generate theorems a) E1 should
not be tainted b) The mov instruction should not
taint any location outside the buffer pointed by
E1
ITP theorem prover
A set of sufficient conditions that imply the
validity of the theorems. They are the security
specifications of the analyzed function.
21
Example strcpy()
char strcpy (char dst,
char src) char res 0 res dst while
(src!0) 1 dstsrc dst
src 2 dst0 return res
0 mov res lt- dst lbl(while6)
branch ( src is 0) exwhile6 1 mov
dst lt- src mov dst lt- ( dst) 1
mov src lt- ( src) 1 branch true
while6 lbl(exwhile6) 2 mov dst lt- 0
mov ret lt- res
Translate to Language L
Theorem generation
a) Suppose S1 is the store before Line L1, then
LocT(S1,dst) false b) If S0 is the store
before Line L0, and S2 is the store after Line
L1, then I lt Eval(S0, dst) or Eval(S0,
dstdstsize) ? I gt
LocT(S2,I) LocT(S0, I) c)
Suppose S3 is the store before Line L2, then
LocT(S3,dst) false
Theorem prover
22
Specifications Suggested by Theorem Prover
  • Suppose when function strcpy() is called, the
    size of destination buffer (dst) is dstsize, the
    length of user input string (src) is srclen
  • Specifications that are extracted by the theorem
    proving approach
  • srclen lt dstsize
  • The buffers src and dst do not overlap in such a
    way that the buffer dst covers the
    NULL-terminator of the src string.
  • The buffers dst and src do not cover the function
    frame of strcpy.
  • Initially, dst is not tainted

Documented in Linux man page
Not documented
23
Example Scenario
Are the extracted specifications possible to be
violated in application code?
index
Destination buffer should not cover the function
frame of strcpy. char input240 void foo( )
int offset char buf200 scanf(s,
input ) offset 200 strlen( input )
strcpy( buf offset , input )
High
buf
bufoffset
foo
buf
Return Addr.
Stack growth
Frame Pointer
src
strcpy
dst
res
Low
24
Other Examples
  • A simplied version of printf()
  • 55 lines of C code
  • Four security specifications are extracted,
    including one indicating format string
    vulnerability
  • Function free() of a heap management system
  • 36 lines of C code
  • Seven security specifications are extracted,
    including several specifications indicating heap
    corruption vulnerabilities.
  • Socket read functions of Apache HTTPD and NULL
    HTTPD
  • The Apache function is proved to be free of
    pointer taintedness.
  • Two (known) vulnerabilities are exposed in the
    theorem proving process.

25
Summary
  • FSM representation decompose each vulnerability
    to multiple simple predicates (with real
    vulnerability examples)
  • A common characteristic of many predicates their
    violations result in pointer taintedness
  • Defined a memory model to reason about pointer
    taintedness
  • Developed a theorem proving approach to extract
    security specifications from library functions

26
Future Directions
  • Develop a VCGen (verification condition
    generator) to facilitate theorem proving. (in
    progress)
  • Apply the pointer taintedness analysis to a
    substantial number of commonly used library
    functions to extract their security
    specifications.
  • Compiler techniques for inserting guarding code
    to check unproved properties at runtime.
  • Explore the possibility of building the
    taintedness notion into virtual machines.
  • Architecture supports for pointer taintedness
    detection. A module working with RSE (Reliability
    and Security Engine).

27
Backup Slides
28
Format String Vulnerability
int vfprintf (FILE s, const char format,

va_list ap) char p (int )
va_arg (ap, void ) count int printf
(const char format, ...) count
vfprintf (stdout, format, arg) int
i,j int main() char buf100 (unsigned int
)bufi (buf4)0 strcat(buf,"ddd12345
n") printf(buf)
buf \x78 \x99 \x04 \x08 d d d 1 2 3
4 5 n
the addr of i
29
Elementary Activity 1 of Sendmail Vulnerability
Elementary Activity 1 get user input Get
strings str_x and str_i, convert them to integers
x and i
a
?
(integer represented by str_x) gt 231
pFSM
Get str_x and str_i
1
(integer represented by str_x) ? 231
Convert str_x and str_i to integers x and i
30
Elementary Activity 2 of Sendmail Vulnerability
Elementary Activity 2 assign debug level
x gt100
xlt0 or xgt100
Convert str_x and str_i to integers x and i
pFSM2
x ?100
0?x ?100
tTvectxi
A function pointer (psetuid) is corrupted
31
Elementary Activity 3 of Sendmail Vulnerability
Elementary Activity 3 manipulation of function
pointer psetuid
A function pointer (psetuid) is corrupted
?
psetuid is changed
Load psetuid to the memory
pFSM
3
starting sendmail program
Execute the code referred by psetuid
psetuid is unchanged
Execute malicious code
32
Appropriateness of Dereference
  • A data value x is appropriate to be dereferenced
    if and only if one of the following condition is
    true, assuming Y,Z are integer constants
  • x is foo (foo is a program variable)
  • x is malloc(Y)
  • If there exist values a, b and c that are
    appropriate to dereference, (recursive
    definition) and x a b c Z
  • Theorems to prove for indirect write mov E1
    lt- E2
  • E1 should be appropriate to dereference
  • If E2 is not appropriate to dereference, then
    E1 should not be appropriate to dereference.

33
About Equational Logic
A logic defined by equations. Equations are used
to rewrite symbolic terms (by replacing the term
on the left of the equation with the term on the
right of the term). Emphasize on its
executability. Define the natural number
(NAT) Operators 0 a constant of NAT
s_ NAT -gt NAT (successor operator)
__ NAT NAT -gt NAT (addition
operator) Equations 0 N N (s M) N
M (s N) Example (s s s 0) (s s 0)
(s s 0) (s s s 0) (s 0) (s s s s 0)
0 (s s s s s 0) s s
s s s 0 Intuitively, this represents 3 2
5
34
Semantics of mov and recv
  • Axioms of mov instruction
  • Ftch((S mov E1 lt- E2),X) Eval(S,E2) if
    (Eval(S,E1) is X) .
  • Ftch((S mov E1 lt- E2),X) Ftch(S,X) if
    not (Eval(S,E1) is X) .
  • LocT((S mov E1 lt- E2),X) ExpT(S,E2) if
    (Eval(S,E1) is X) .
  • LocT((S mov E1 lt- E2),X) LocT(S,X) if
    not (Eval(S,E1) is X) .
  • Semantics of recv (similarly, scanf, recvfrom)
  • LocT(S call recv (sock , buf , len, flag), A)
    true if Eval(S,buf) lt A and A lt Eval(S,
    buf len) .
  • LocT(S call recv (sock , buf , len, flag), A)
    LocT(S, A) otherwise .

35
Related Work
  • Security Modeling
  • Sheyner and Wing Attack graphs
  • Ortalo and Deswarte Markov models
  • Static code analysis
  • Buffer overflow detection Wagner, many others
  • Format string detection CQUAL, SPLINT
  • Assembly code verification Proof-Carrying Code
  • Generic (annotation based) SPLINT, Eau Claire
  • Taintedness analysis
  • Perl runtime
  • CQUAL and SPLINT taintedness of program
    variables.
  • A symbol gets tainted only if an explicit C
    statement passes a tainted value to it by
    assignment, argument passing or function return.
    No underlying memory model.
  • Not sufficient to detect real pointer taintedness
    vulnerabilities.

36
Position My Work
Application Code
Library Functions
Existing static analysis tools
Security Specs
My work
e.g., src_len lt dst_size (strcpy) src
and dst do not overlap (strcpy) Do not free
a stack buffer Do not double free a buffer First
argument of printf cannot come from user
37
Presentation Outline
  • A Brief Description of FSM Approach of Modeling
    and Analyzing Security Vulnerabilities
  • Real Examples of Pointer Taintedness
  • Definition of Pointer Taintedness in Equational
    Logic
  • Extraction of Function Specifications by Theorem
    Proving
  • Summary and Future Directions

38
Extraction of Security Specs of Library Functions
using Pointer Taintedness
  • A formal approach to reason about potential
    vulnerabilities in library source code.
  • Reasoning based on a hypothetical memory model a
    boolean property taintedness associated with each
    memory location.
  • The semantics of pointer taintedness defined in
    equational logic.
  • A theorem prover employed to extract security
    specifications of library functions.
  • Security specifications extracted by the
    analysis
  • expose different classes of known security
    vulnerabilities, such as format string, heap
    corruption and buffer overflow vulnerabilities
  • indicate function invocation scenarios that may
    expose new vulnerabilities.

39
Observations from Data Analysis (cont.)
  • Exploiting a vulnerability involves multiple
    vulnerable operations on several objects.
  • Exploits must pass through multiple elementary
    activities, each providing an opportunity for
    performing a security check.
  • For each elementary activity, the vulnerability
    data and corresponding code inspections allow us
    to define a predicate, which if violated, will
    result in a security vulnerability.
Write a Comment
User Comments (0)
About PowerShow.com