Title: Detection of ASCII Malware
1Detection of ASCII Malware
- Parbati Kumar Manna
- Dr. Sanjay Ranka
- Dr. Shigang Chen
2Internet Worm and Malware
- Huge damage potential
- Infects hundreds of thousands of computers
- Costs millions of dollars in damage
- Melissa, ILOVEYOU, Code Red, Nimda, Slammer,
SoBig, MyDoom - Mostly uses Buffer Overflow
- Propagation is automatic (mostly)
3Recent Trends
- Shift in hackers mindset
- Malware becoming increasingly evasive and
obfuscative - Emergence of Zero-day worms
- Arrival of Script Kiddies
4Motivation for ASCII Attacks
- Prevalence of servers expecting text-only input
- Text-based protocols
- Presumption of text being benign
- Deployment of ASCII filter for bypassing text
5IDS Detecting ASCII Attack?
- Disassembly-based IDS
- All jump instructions are ASCII
- Higher proportion of branches
- Exponential disassembly cost
- High processing overhead for IDS
- Frequency-based IDS
- PAYL evaded by ASCII worm
6Buffer Overflow
7Constraints of ASCII Malware
- Opcode Unavailability
- Shellcode requires binary opcodes
- Here only xor, and, sub, cmp etc.
- Must generate opcodes dynamically
- Difficulty in Encryption
- No backward jump
- Cant use same decrypter routine for each
encrypted block - No one-to-one correspondence between ASCII and
binary
0 m a y v a r y
ASCII
binary
8Creation of ASCII Malware
9Buffer Overflow using ASCII
Overflowing a buffer using an ASCII string
10Detection of ASCII Malware
- Opcode Unavailability
- Dynamic generation of opcodes needs more ASCII
instructions for each binary instruction - Difficulty in Encryption
- No backward jump means decrypter block for each
encrypted block must be hardcoded - Long sequence of contiguous valid instructions
likely ? high MEL
What is this MEL?
11Maximum Executable Length
- Indicates maximum length of an execution path
- Need to disassemble (and execute) from all
possible entry points - All branching must be considered
- Abstract payload execution
- Used for binary worms with sled
- Effectiveness dwindled presently
12Benign Text has Low MEL
- Contains characters that correspond to invalid
instructions - Privileged Instruction (I/O)
- Arbitrary Segment Selector
- More Memory-accessing instructions may use
uninitialized registers - Long sequence of contiguous valid instructions
unlikely ? low MEL
13Proposed Solution
- Find out the maximum length of valid instruction
sequence - If it is long enough, the stream contains a
malware
- Question
- How long is long?
14Probabilistic Analysis
- Toss a coin n times
- What is the probability that the max distance
between two consecutive heads is ?
Head (H)
Invalid Instruction (I)
Tail (T)
Valid Instruction (v)
T H T T H T T T T T H T T T V I V V I V V V V V I
V V V
15Probabilistic Analysis
n number of coin tosses p
probability of a head Xi R.V.s for
inter-head distances Xmax Max inter-head
distance
C.D.F of Xmax Prob Xmax x 1
p(1-p)x n
F.P. rate ? 1 - Prob Xmax t 1 - 1
p(1-p)t n
16Probabilistic Analysis
For a fixed N k (exactly k invalid instructions)
17Probabilistic Analysis
For all possible values of N
18Threshold Calculation
n , p , ? (false positive rate)
Known
? (max inter-head distance)
Unknown
Threshold
19Independence Assumption
- Validity of an instruction is an independent
event - All the Xis are independent (while ? Xi n)
?2 test contingency table Observed Observed Expected Expected
?2 test contingency table I2 is valid I2 is invalid I1 is valid I2 is invalid
I1 is valid 8960 2797 8922 2835
I1 is invalid 2797 938 2835 900
20Threshold Calculation
With increasing n, we must choose a larger ? to
keep the same rate of false positive ?
21Threshold Calculation
With decreasing p, we must choose a larger ? to
keep the same rate of false positive ?
22Determine n
EI EPrefix chain length Ecore
instruction length
Obtained from character frequency of input data
23Determine p
- Privileged instructions
- Wrong Segment Prefix Selector
- Un-initialized memory access
Invalid Instructions
Only 1. and 2. can be determined on a standalone
basis
24Experimental Setup
25Implementation
26Experimental Setup
- Benign data setup
- ASCII stream captured from live CISE network
using Ethereal - Malicious data setup
- Existing framework used to generate ASCII worm by
converting binary worms - Promising experimental results for max valid
instruction length - Benign all max values all below threshold ?
- Malicious values significantly higher than ?
27Experimental Results (DAWN)
28Experimental Results (APE-L)
29Contrasting with APE
- Full content examination
- Threshold calculation
- Sled Vs. malware
- Exploiting text-specific properties
30Multilevel Encryption
Encryption
binary
ASCII
ASCII
Only Visible decrypter
Decryption
ASCII
ASCII
binary
31Multilevel Encryption
Text 0x20 0x3F
?
Binary
Binary
?
?
?
?
Text 0x40 0x5F
Text 0x60 0x7E
?
32Questions
33Thank you