Dynamic Analysis Applications - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Dynamic Analysis Applications

Description:

Signature based malware detection is still the most effective technique (0.01 ... The most prominent feature of a packed malware is the control flow transfer to a ... – PowerPoint PPT presentation

Number of Views:47

Avg rating:3.0/5.0

Slides: 22

Provided by: hpcu59

Category:

more less

Transcript and Presenter's Notes

Title: Dynamic Analysis Applications

1
Dynamic Analysis Applications

Xiangyu Zhang

2
Primitives

Tracing
Profiling
Checkpointing and replay
Slicing (dependence detection)
Indexing
Delta-debugging (input reduction)

3
Four Sample Applications

Malware unpacking
Profiling for concurrency
Taint analysis
Input structure reverse engineering

4
Malware Unpacking

A malware often features an encrypted code body
and a decryption engine. Given an executable with
an embedded malicious code piece, acquire the
plain text of the malware code body.
Malware gt goodware.
Daily report of Symantec contains a few thousands
of malware.
70 of malwares are packed.
Signature based malware detection is still the
most effective technique (0.01 false positive
rate). It requires unpacking.

5
A Packed Malware Binary

A binary is packed if some portion of its code is
not present until runtime

Original Binary
Packed Binary

Payload program is mostly unchanged

Address Space
Address Space
Entry Point

Timing checks of various granularities
Control flow obfuscation

.loop lea eax, 0x4a0000 lea ebx,
0x401000 load ecx, ptr r1 xor ecx,
0xffffff store ptrecx, r2 ... jnz .x call
ptredi .x add eax, 4 add ebx, 4 cmp
eax, 0x4a1f88 jnz .loop jmp 0x401000
Entry Point
Anti-Debugger Code
Unpacking loop
Unpacking Loop
Packed code initially compressed or encrypted
JUMP

Control transfer to unpacked code

Packed Binary Analysis with Dyninst
5 of 19
The slide is from Kevin Roundy
6
Unpacking by Tracing

The most prominent feature of a packed malware is
the control flow transfer to a dynamic generated
region.

1. for (i...) 2. Bi Ai XOR key 3.
4. goto B0

Collect the memory access trace
Upon execution of an instruction PC that writes
value V to address X
Create one trace entry ltPC, X, Vgt.
HashmapXPC
Upon execution of a control flow transfer
instruction to X
Test if HashmapX is defined. If so, the program
is executing a dynamically generated instruction.
The decryption loop is identified by
PCHashmapX

7
Unpacking by Tracing (continued)

After the identification of the decryption
instruction PC, search through the trace to
collect the sequence of values that are written
by PC, which is often the plain text body of the
malware.

8
More on Malware

Unpacking (decryption) occurs page by page.
Unpack one page, execute that page, trap the
execution, unpack another page to the same buffer
space, and so on.
Is our tracing technique still working?
Use multiple packers.
The plain text can only be reached after multiple
levels of unpacking.
Anti-tracing techniques
Detecting obvious slow-down of its own execution.
Quoted from Symantec
We know dynamic analysis is the future of AV
because of packing and obfuscation, but the
problem is to be able to run it and afford
running it.

9
Profiling Parallelism

A recent trend to parallelize a sequential
program is to spawn a method call as a separate
thread.

asynchronous foo()
foo()
foos body
foos body
foos continuation
foos continuation
10

Devise a profiler that identifies method calls
that are amenable to such parallelization.
A naïve solution collect dependence traces with
the form of ltPCuse, PCdefgt. If PCdef is inside a
dynamic method call C and PCuse is in the
continuation of C, then the method is not
amenable to asynchronous invocation.

The problems
It is unlikely that the value written in a method
call is not used later (observed by all the three
proposals I received).
Do we care if the control flow (time) distance
between the definition and the use is so long
that the conflicting dependence can be easily
respected although the call is spawned as a
thread? (observed by one proposal)
Nesting functions and repetitive functions.

11
Dependence Filtering
C a method invocation Tdur the duration of the
method invocation Tdep the distance between
the def and the use involving in a dependence
12
Nesting and Repetition Problem
void A ( ) while (...) s1 void
B ( ) A ( ) s2 void main ( )
A ( ) B ( )
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

E 11 2 3 2 3 2 12 6 2 3 2
7
Rmain ?11 RA 12 RB RA ?2R2 R2 ?3 2 R2 e RB
?6RA 7
2 ? 2, two cases
2 ? 7, two cases
13
The Solution

Maintain execution indices during the run.
Dependences are detected in the form of ltIDXuse,
IDXdefgt
Upon the detection of a dependence
Traverse along the index path of the definition
point backwards (from the leaf, i.e., the
definition pc, to the root) until a common
ancestor of both the definition and the use point
is reached.
Along the traversal, the profile of all
intermediate nodes corresponding to a method call
is updated.
Given the profile, transform the program
accordingly.
Using Java futures (by one proposal).

14
Taint Analysis

Inputs from untrusted sources such as network
packets or even files are tainted, meaning they
are not trusted. Taint bits are propagated
through dependences during execution so that
variables are not trusted if the tainted input
affected their values.
Conditional/unconditional jumps to a tainted
location is considered as a security violation.
It can be also mutated to detect information leak.

15
Implementing Taint Analysis Using DP Primitives

Build dynamic dependence graph
See if the entry points of untrusted values are
in the dynamic slices of output values.

a buffer overflow exploit

void ( F) () char A2 ... read(B,
256) i2 AiBi ... (F) ()
16
Reverse Engineering Input Syntactic Structure

Focus on syntactic structure
Inputs follow a certain grammar
Derive the derivation tree (AST) for a given
input
Motivation
Test generation
Network protocol analysis
Delta debugging
Basic Idea
Trace the use points of input values
Build the index tree for the execution, annotate
the index tree at input use points with the
values used.
The annotated tree serves as the input syntactic
tree.