Dynamic Analysis Applications - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Dynamic Analysis Applications

Description:

Signature based malware detection is still the most effective technique (0.01 ... The most prominent feature of a packed malware is the control flow transfer to a ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: hpcu59
Category:

less

Transcript and Presenter's Notes

Title: Dynamic Analysis Applications


1
Dynamic Analysis Applications
  • Xiangyu Zhang

2
Primitives
  • Tracing
  • Profiling
  • Checkpointing and replay
  • Slicing (dependence detection)
  • Indexing
  • Delta-debugging (input reduction)

3
Four Sample Applications
  • Malware unpacking
  • Profiling for concurrency
  • Taint analysis
  • Input structure reverse engineering

4
Malware Unpacking
  • A malware often features an encrypted code body
    and a decryption engine. Given an executable with
    an embedded malicious code piece, acquire the
    plain text of the malware code body.
  • Malware gt goodware.
  • Daily report of Symantec contains a few thousands
    of malware.
  • 70 of malwares are packed.
  • Signature based malware detection is still the
    most effective technique (0.01 false positive
    rate). It requires unpacking.

5
A Packed Malware Binary
  • A binary is packed if some portion of its code is
    not present until runtime

Original Binary
Packed Binary
  • Payload program is mostly unchanged

Address Space
Address Space
Entry Point
  • Timing checks of various granularities
  • Control flow obfuscation

.loop lea eax, 0x4a0000 lea ebx,
0x401000 load ecx, ptr r1 xor ecx,
0xffffff store ptrecx, r2 ... jnz .x call
ptredi .x add eax, 4 add ebx, 4 cmp
eax, 0x4a1f88 jnz .loop jmp 0x401000
Entry Point
Anti-Debugger Code
Unpacking loop
Unpacking Loop
Packed code initially compressed or encrypted
JUMP
  • Control transfer to unpacked code

Packed Binary Analysis with Dyninst
5 of 19
The slide is from Kevin Roundy
6
Unpacking by Tracing
  • The most prominent feature of a packed malware is
    the control flow transfer to a dynamic generated
    region.

1. for (i...) 2. Bi Ai XOR key 3.
4. goto B0
  • Collect the memory access trace
  • Upon execution of an instruction PC that writes
    value V to address X
  • Create one trace entry ltPC, X, Vgt.
  • HashmapXPC
  • Upon execution of a control flow transfer
    instruction to X
  • Test if HashmapX is defined. If so, the program
    is executing a dynamically generated instruction.
  • The decryption loop is identified by
    PCHashmapX

7
Unpacking by Tracing (continued)
  • After the identification of the decryption
    instruction PC, search through the trace to
    collect the sequence of values that are written
    by PC, which is often the plain text body of the
    malware.

8
More on Malware
  • Unpacking (decryption) occurs page by page.
  • Unpack one page, execute that page, trap the
    execution, unpack another page to the same buffer
    space, and so on.
  • Is our tracing technique still working?
  • Use multiple packers.
  • The plain text can only be reached after multiple
    levels of unpacking.
  • Anti-tracing techniques
  • Detecting obvious slow-down of its own execution.
  • Quoted from Symantec
  • We know dynamic analysis is the future of AV
    because of packing and obfuscation, but the
    problem is to be able to run it and afford
    running it.

9
Profiling Parallelism
  • A recent trend to parallelize a sequential
    program is to spawn a method call as a separate
    thread.

asynchronous foo()
foo()
foos body
foos body
foos continuation
foos continuation
10
  • Devise a profiler that identifies method calls
    that are amenable to such parallelization.
  • A naïve solution collect dependence traces with
    the form of ltPCuse, PCdefgt. If PCdef is inside a
    dynamic method call C and PCuse is in the
    continuation of C, then the method is not
    amenable to asynchronous invocation.
  • The problems
  • It is unlikely that the value written in a method
    call is not used later (observed by all the three
    proposals I received).
  • Do we care if the control flow (time) distance
    between the definition and the use is so long
    that the conflicting dependence can be easily
    respected although the call is spawned as a
    thread? (observed by one proposal)
  • Nesting functions and repetitive functions.

11
Dependence Filtering
C a method invocation Tdur the duration of the
method invocation Tdep the distance between
the def and the use involving in a dependence
12
Nesting and Repetition Problem
void A ( ) while (...) s1 void
B ( ) A ( ) s2 void main ( )
A ( ) B ( )
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
  • s

E 11 2 3 2 3 2 12 6 2 3 2
7
Rmain ?11 RA 12 RB RA ?2R2 R2 ?3 2 R2 e RB
?6RA 7
2 ? 2, two cases
2 ? 7, two cases
13
The Solution
  • Maintain execution indices during the run.
  • Dependences are detected in the form of ltIDXuse,
    IDXdefgt
  • Upon the detection of a dependence
  • Traverse along the index path of the definition
    point backwards (from the leaf, i.e., the
    definition pc, to the root) until a common
    ancestor of both the definition and the use point
    is reached.
  • Along the traversal, the profile of all
    intermediate nodes corresponding to a method call
    is updated.
  • Given the profile, transform the program
    accordingly.
  • Using Java futures (by one proposal).

14
Taint Analysis
  • Inputs from untrusted sources such as network
    packets or even files are tainted, meaning they
    are not trusted. Taint bits are propagated
    through dependences during execution so that
    variables are not trusted if the tainted input
    affected their values.
  • Conditional/unconditional jumps to a tainted
    location is considered as a security violation.
  • It can be also mutated to detect information leak.

15
Implementing Taint Analysis Using DP Primitives
  • Build dynamic dependence graph
  • See if the entry points of untrusted values are
    in the dynamic slices of output values.
  • a buffer overflow exploit

void ( F) () char A2 ... read(B,
256) i2 AiBi ... (F) ()
16
Reverse Engineering Input Syntactic Structure
  • Focus on syntactic structure
  • Inputs follow a certain grammar
  • Derive the derivation tree (AST) for a given
    input
  • Motivation
  • Test generation
  • Network protocol analysis
  • Delta debugging
  • Basic Idea
  • Trace the use points of input values
  • Build the index tree for the execution, annotate
    the index tree at input use points with the
    values used.
  • The annotated tree serves as the input syntactic
    tree.

17
Input Grammar and AST
18
The Implementation Related To Parsing
19
Execution Traces and the Index Tree
20
(No Transcript)
21
Open problems
  • AST ! grammar
  • Grammars are much more useful
  • Reject malicious input from the beginning
  • Facilitate protocol understanding
  • Single Input only
  • How to fuse
  • How to mine the grammar from multiple ASTs?
  • What about the semantic part?
  • Keyword, length?
  • They are indeed all constraints!
Write a Comment
User Comments (0)
About PowerShow.com