Title: IDA and obfuscated code
1IDA and obfuscated code
Hex-Rays Ilfak Guilfanov
2Presentation Outline
- Is obfuscated code a problem for IDA Pro?
- IDA Pro expects nice proper code
- A lost battle?
- At the first sight, yes
- Solutions exist
- They are numerous...
- Future development
- Your feedback
- Online copy of this presentation is available at
http//www.hex-rays.com/idapro/ppt/caro_obfuscatio
n.ppt
3Sample obfuscated code
- IDA is a static analysis tool and it makes many
assumptions about the input code - When these assumptions are violated, the analysis
goes wrong - An extremely simple case, call instructions are
expected to return to the next instruction
problem
The solution will be presented later...
4Obfuscation categories
- Redundancy
- Blow the code size code cleaning is necessary
- Camouflage
- Hide seek the seeker is to win
- Anti-debugger tricks
- Tricks can be learned even by old dogs
- Since it is just obfuscation, a determined
reverse engineer will eventually overcome it
5Redundancy
- Instructions with no effect
- Useless jumps
- Complex computations with a constant result
- Code duplication
6Instructions with no effect
7Instructions with no effect - countermeasures
- Replace them by 'nop's
- Collapse regions of useless instructions into one
line (select useless instructions, then View,
Hide)?
Ideally, a plugin to clean up the code would be
nice. The Hex-Rays decompiler ignores useless
instructions because it simply removes all dead
code but it can not handle obfuscated code well
expect improvements in this direction
8Useless jumps
- Text view is pretty useless
9Useless jumps
- Graph view is slightly better
A plugin to clean the graph and combine adjacent
nodes would be really useful (can be done without
modifying the database)?
10Graph view and plugins
- Graphs generated by IDA can be modified by a
plugin on the fly just hook to
grcode_changed_graph event - This allows for improving the graph. Some ideas
- Combine sequential nodes into one
- Hide dead code paths
- Remove dead edges
- Add annotations to graph nodes/edges
- Automatically recognize and collapse patterns
(e.g.strlen)? - Local optimization (within a node constant
folding, etc)? - All this can be really useful for obfuscated code!
11Constant result calculations
- Some constant calculations can be easily handled
Ctrl-R
12When there are too many offsets...
- The answer is obvious write a script or a
plugin )? - Here's very simple one-line script
OpOffEx(here, 1, REF_OFF32REFINFO_NOBASE, -1,
EBP, 0)?
- To make your life even easier, you may assign a
script to a hotkey, press Shift-F2 and
enter - This trick and many others are explained on
http//www.xs4all.nl/itsme/projects/disassembler
s/ida.html
AddHotkey("w", "make_ebp_offset") static
make_ebp_offset()? OpOffEx(here, 1,
REF_OFF32REFINFO_NOBASE, -1, EBP, 0)
13What if there are thousands of such offsets?...
- Improve the script to check all instructions for
the desired pattern. Here's how to organize a
loop over all instructions
auto ea, ea2 ea2 MaxEA() for ( eaMinEA() ea
lt ea2 eaNextHead(ea, ea2) )? if (
!isCode(GetFlags(ea)) )? continue if (
GetMnem(ea) "mov" GetOpnd(ea, 0) "ebp"
)? Message("a found mov ebp!\n", ea)
14What if these offsets appear and vanish
dynamically?
- Well, then you have to create a plugin. It would
- Recognize the desired pattern
- Modify the database (create an offset, code, add
cmt, etc)? - Such plugins are fully automatic
- They hook to analysis events (frequently to
custom_emu)? - This is the most powerful technique but, alas, it
requires DLL programming in C and using the SDK - Just three wishes for your plugins
- Maybe a switch to turn your plugin off is a good
idea - Try to be user-friendly (for example, check if
there is a comment before calling set_cmt
otherwise you may overwrite a user-defined
comment)? - Do not exit to OS in the case of errors
15Constant calculations some ideas
- Create a script or plugin to
- Add calculation results as comments (what about a
script that traces the application and adds
register values as comments for each
instruction?)? - Modify the database and simplify instructions
16Camouflage
- Opaque predicates
- Proprietary virtual machine
- Encryption/compression
- Message-driven systems
- No direct references PIC (position independent
code) code - Hidden execution flow using SEH
- Rootkit techniques
- Hidden entry point (TLS callbacks, entry point in
the resources section or in the header)?
17Opaque predicates
- The definition says that opaque predicate is a
predicate (an expression that evaluates to
either "true" or "false") for which the outcome
is known by the programmer a priori, but which,
for a variety of reasons, still needs to be
evaluated at run time - In fact, some expressions evaluate to any integer
value
GetLastError returns 0x57 (Invalid Parameter)?
18Opaque predicates
- They may come in many varieties. Since we can not
determine the outcome statically, we have to find
it out ourselves and - Inform IDA about the predicate outcome
- Prune dead code paths and simplify the code
- Working on graph view or pseudocode is easier
- Automate this? How?
- Future versions of IDA/Hex-Rays will offer some
solutions - Interactivity and extendibility helps
19Proprietary virtual machine
- Many implementations use this obfuscation method
- Requires reverse engineering the virtual machine
- Examples
- Themida Code Virtualizer (http//www.oreans.com/
)? - Various malware
- In general case, building a processor module for
the VM is required - Let me show you a simple case
20Bagle malware case
- This mass mailer contains the following code
sequence
21Bagle - opcodes
- Opcode handlers are very simple, I renamed them
22Bagle opcode table
- After renaming all handlers the opcode table was
23Bagle create opcode enumeration
- The following script created a enumeration for
all VM opcodes based on the handler names
24Bagle enumeration ready
- We can use this enumeration in the disassembly
now - Just declare an array of bytes and convert them
to VM_CODES - All this without quitting IDA (in fact, I was in
the middle of a debugging session since there was
another layer of protection before the VM)?
25Bagle virtual machine readable
- Create an array of bytes, declare them as
VM_CODES
26Bagle VM logic visible
- The logic of the VM program became visible but
there were immediate constants in the code that
required manual intervention
27Bagle VM decoding automated
- The following script solve the problem
28Bagle comfortable analysis of VM
- After assigning a hotkey to the previous script,
it was almost the same as having a processor
module for the VM - However, another level of deobfuscation is
required(0x63FE34B2 0x9C01CB4D 0xFFFFFFFF)?
29VM - summary
- We have to
- Analyze VM opcodes
- Give them meaningful, descriptive names
- In simple cases, simple enumeration will do the
job - In complex cases, a processor module has to be
developed - It is not _that_ difficult after all )?
- Rolf Rolles created a processor module for a
VMhttp//www.openrce.org/articles/full_view/28
30Executable packing
- Plethora of packing methods, good and bad
- Manual unpacking is always possible automatic
unpacking would be ideal - There are sample scripts and plugins in IDA
- uunp proof of concept unpacker plugin, exists
as an IDC script as well - unpack another sample unpacker
- IDA stayed away from this arms race
- There are many other solutions available
(unpackers, process dumpers, etc)?
31Executable packing - approaches
- Static analysis
- too time consuming
- requires tedious manual work
- Dynamic analysis (debugger)?
- much faster
- requires special sandboxed environment
- vulnerable to anti-debugger tricks
- Code emulation
- a good idea
- any widespread emulator will be attacked
- emulation imperfections are a problem
- No ideal solution...
32Encryption
- Methods vary from simple XOR encryption to
serious encryption schemes like AES, Blowfish,
etc - Since the key must be present to run the
executable, the strength of the encryption method
does not matter - Ideally we just let the application decrypt
itself and then take a memory snapshot - If only part of the executable is decrypted at a
time, then we need to automate the process of
taking memory snapshots
33Position independent code
- No fixed addresses means no xrefs
- Analysis is harder but user-defined offsets can
help
34Anti-debugging tricks
- I'm sure you know better since you are the
practitioners )? - IDA related
- Its default settings are not good for hostile
code debugging - Exceptions are handled by the debugger change
it in the debugger settings - Just two simple methods
35Use tracing to find anti-debugging tricks
- Tracing is slow but it may be used to find
why/when/how the process misbehaves - Sample trace log from a naïve code
36Simple method to neutralize found tricks
- Use conditional breakpoint to neutralize tricks
encountered while single-stepping - The breakpoint condition for the call instruction
is ipip2 - Breakpoint conditions may call all defined IDC
functions (including user-defined ones) can be
used for logging and changing the application
behavior
37Debugger current state
- IDA debugger advantages
- The annotated database is available during
debugging - All facilities continue to work FLIRT
signatures, function prototypes and argument
names, structures, enumerations, your scripts and
plugins, etc... - Scriptable
- Available on multiple platforms (remote
debugging)? - Shortcomings
- Slow operation
- Multithreaded applications poorly handled
- Only application level debugging is available
- We continue to work on the shortcomings
- Future versions will be more fit for hostile code
analysis
38Debugger - ideas
- A debugger plugin to configure the 'stealth' mode
- Exceptions are passed to the application
- Calls to IsDebuggerPresent, NtSetInformationThread
and similar functions are intercepted - Emulating debugger module
- A 'stealth' debugger module
- Do not use the standard debugger interface
(CreateProcess/WaitForDebugEvent)? - Inject a debugger DLL into the process and
communicate with it (the must-have functionality
is breakpoint handling and memory access)? - Higher level debugging
- Skip hidden code areas, group nodes in the graph
view - Source level debugging using the pseudocode view
39Summary
- Obfuscation methods vary, no single receipt for
all cases - The key is to be able to represent the code
nicely on the screen - The problem is generic what to do if IDA
displays things not the way I want? - The answer is modify the output!
- Use interactive commands, menus, etc
- Represent data in meaningful way
- Hide irrelevant information
- Patch the database and simplify it
- Create scripts, plugins, processor modules to
avoid routine work
40The obfuscating call instruction
- The function returns a few bytes further that it
would normally
41Example solution to obfuscating call
- The idea intercept emulation of calls to
ex_obfuscating and create correct xrefs - Just a few lines of code (unfortunately, a
plugin)? - Can be made more complex if necessary
- The source code of the sample plugin can be found
at http//www.hexblog.com/ida_pro/files/ex_deobfus
cate.zip - See the next slide for the essential part of the
plugin
42Plugin to handle weird call instructions
43Deobfuscated code
- Note the arrow on the left side of the listing
- Graph could be simplified further by a plugin
44The thank you slide
- Thank you for your attention!Questions?