Finding Diversity in Remote Code Injection Exploits

About This Presentation

Title:

Finding Diversity in Remote Code Injection Exploits

Description:

Finding Diversity in Remote Code Injection Exploits University of California, San Diego Justin Ma, Stefan Savage, Geoffrey M. Voelker and Microsoft Research – PowerPoint PPT presentation

Number of Views:196

Avg rating:3.0/5.0

Slides: 36

Provided by: kpo49

Category:

more less

Transcript and Presenter's Notes

Title: Finding Diversity in Remote Code Injection Exploits

1
Finding Diversity inRemote Code Injection
Exploits
University of California, San DiegoJustin Ma,
Stefan Savage, Geoffrey M. VoelkerandMicrosoft
ResearchJohn Dunagan, Helen J. Wang

Presented by
Kenneth Poon Fai Yiu
2.4.2007

2
Outline

Introduction / Objectives
Benefits of Malware Family Tree
A Remote Code Injection Attack
Shellcode
Methodology for Measuring Diversity
Analysis of Exploit Diversity
NIDS vs Polymorphism
Factors Driving the Evolution
Observations

3
Introduction

Internet users are facing with increasing threats
of online crimes due to the presence of numerous
malware running on the Internet
Previous studies were focused on methods for
defending against such attacks
Few researches have been done on the malware
ecosystem, such as
the relationship between different pieces of
malware
the factors that drive the structural and
functional evolution of malware

4
Objectives

To develop a measurement methodology for
identifying and measuring the diversity among
remote code injection exploits
Use the measured data to
understand the diversity of todays malware, and
construct a shellcode phylogeny (i.e. a malware
family tree) for selected vulnerabilities

5
Benefits of Malware Family Tree

Simplify the categorization and analysis of
malware
Provide insight into the factors influencing
malware development and evolution
Help in estimating the market-share and vigor of
different cyber-criminal organizations

6
Glossary

Vulnerability
A system bug or design flaw allowing an attacker
to misuse an application (e.g. executing commands
on the system)
Malware (Malicious Software)
Software designed to infiltrate or damage a
computer system, e.g. computer viruses, worms,
spyware and adware
Exploit
Software that attack a vulnerability of a system
in order to gain control of it
A remote exploit is an exploit that works over
a network
A kind of malware use interchangeably with
malware throughout the presentation
Code injection
A technique to add codes into a computer program
to modify its functionality
Shell code
A piece of machine code used as the payload of an
exploit
May contain mechanism to avoid detection by
detection by anti-intrusion system
Phylogeny
A biological term - the study of evolutionary
relationship among organisms
The classification of exploits according to their
relationship in the evolutionary history

(Source Wikipedia)
7
A Remote Code Injection Attack
?
?
Second, there is a computer with Internet
connection installed with this software without
applying any patch
Third, the malware attacks the computer by
injecting exploit code (shellcode, data and
random character fillers) to the vulnerability
First, there exists a software (e.g. MS Window
XP) with vulnerability (e.g. a stack based buffer
overflow) and a corresponding malware targeted
for such vulnerability
8
A Remote Code Injection Attack
Exploit Packets
Fourth, the codes overwrite the data in the
buffer beyond the boundaries and changes the
contents of memory location adjacent to the
buffer which may be used by other buffers and
variables. If the buffer is a stack-based buffer,
the return address of the calling function can
also be changed (e.g. to the address of the
shellcode)
Fifth, the exploit gains control of the computer
and executes the shellcode
Sixth, the shellcode may (1) download additional
software to the computer, (2) join a centralized
botnet or (3) reconfigure the operating system
to evade detection
9
Shellcode

Small, simple, hand-coded machine programs
Initial payload of an exploit that first executes
on a newly compromised machine
Polymorphism (variation in the style of
construction)
May be encrypted and only decrypted just before
execution
XOR encoding is a commonly used encoding scheme
May contain anti-debugging code (including
self-modifying code) to complicate disassembly
and analysis of the shellcode

10
Methodology for Measuring Diversity

Exploit collection
(To collect exploit samples)
?
Extracting shellcodes
(To extract shellcodes from the collected exploit
samples)
?
Exploit emulation
(To run the extracted shellcodes to retrieve the
instruction code bytes )
?
Clustering
(To group the instructions code bytes into
families)

11
Methodology for Measuring DiversityExploit
Collection

Examine network traces of traffic using a
fully-patched Windows XP computer connected to a
residential DSL network
Capture exploit attempts from the DSL network to
4 well-known vulnerabilities for 2 days starting
from 6/9/2006 500 pm
The 4 vulnerabilities are
SQL Name Resolution (Slammer)
LSASS (Sasser)
MS RPC IsystemActivator (Blaster)
MS RPC RemoteActivation (Blaster)

12
Methodology for Measuring DiversityExtracting
Shellcodes

Extract shellcodes directly from the collected
network trace using Shield
Shield
A tool originally designed for filtering exploits
for known vulnerabilities
But modified to collect data that is beyond the
buffer boundary

13
Methodology for Measuring DiversityExploit
Emulation

Most shellcodes are encrypted decoding is needed
to reveal the actual executable code
The solution is restricted binary emulation, i.e.
allowing the exploit decoding routines to execute
in order to reveal the actual instruction codes
Implement the emulator on a Linux platform
Load an encoded shellcode, declare it as a
statically allocated buffer, treat the buffer as
a function and allow it to run
Overcome the issue with non-executable prefixes
by iteratively retrying failed emulations at
subsequent offsets
Mark the executed instruction bytes for later
analysis
Emulation stops when the control flow makes an
absolute jump to a location outside the buffer

14
Methodology for Measuring DiversityClustering
(1)

A datamining technique for grouping objects with
similar characteristics
Perform clustering on the shellcode instruction
bytes using exedit distance - a metric for
measuring the similarity between 2 sets of
shellcode instruction bytes generated by binary
emulation
Construct a dendrogram to visualize the
clustering results
Evaluate the resulting clusters manually to
confirm the constructed family tree is a sensible
representation of the phylogeny of the exploit
families

15
Methodology for Measuring DiversityClustering
(2)

Exedit Distance
Relative edit distance over the shellcode
instruction bytes, which is the number of edit
operations (insertion, deletion, substitution)
used to transform one string to another
For each sample,
Mark the executed instruction bytes
Concatenate the marked bytes in the order they
appear in the payload (i.e., memory order) to
construct a string representation
Compress each consecutive run of the NOP (No
operation) instructions into one single NOP
instruction
Compute the relative edit distance over all
exploits using these strings

16
Analysis of Exploit DiversitySQL Name
Resolution (1)

Malware Slammer worm
First noticed on 25.1.2003 infected 75,000
computers in 10 minutes
Exploited two buffer overflow bugs in Microsoft's
SQL Server and Desktop Engine database products
By generating random IP addresses and send itself
out to those addresses
Dramatically slowed down general Internet traffic
Patch was available six months before the worms
first launch

(Source Wikipedia)
17
Analysis of Exploit DiversitySQL Name
Resolution (2)

767 exploit samples were collected
2 apparent variations of Slammer were detected
766 exploits with the exact same payload and 1
outlier
The outlier was identical to all the other
payloads except for the last 91 bytes evidence
shows that the payload was likely corrupted on
the network before being captured in the trace
By discarding the outlier sample, there was only
1 Slammer exploit in the DSL trace so, no
exploit diversity

18
Analysis of Exploit DiversityLSASS (1)

Malware Sasser worm
First noticed on 30.4.2004 disrupted operations
for airlines, banks, and government offices
globally
Exploited a buffer vulnerability in LSASS (Local
Security Authority Subsystem Service) of MS
Windows 2000 and XP
By scanning different ranges of IP addresses and
connects to victims computers primarily through
TCP port 139 or 445
Patch was available in 4.2004, prior to the
release of the worm
Written by a 18 years old CS student in Germany
arrested and received a 21 month suspended
sentence

(Source Wikipedia)
19
Analysis of Exploit DiversityLSASS (2)
Histogram of shellcode instance

1769 exploit samples were collected
56 distinct payload were identified

20
Analysis of Exploit DiversityLSASS (3)
Dendrogram

Each x-axis position represents a unique
shellcode
The y-axis shows relative edit distance
A horizontal line segment at y-axis value y
indicates that two sub-clusters had cluster
distance y when they were merged into one
cluster

21
Analysis of Exploit DiversityLSASS (4)
Dendrogram

Most cluster merges occurred at a small exedit
distance of 10 use 10 as threshold for
defining families among the exploits
5 families of shellcodes can be identified
Manual examination of the shellcodes concluded
that the identified families were indeed 5
separate code bases
LSASS-2, 3 and 4 had sufficient similarity to
conclude that they were evolved of the same code
base

22
Analysis of Exploit DiversityLSASS (5)
Dendrogram
Evolution diagram

Shellcodes within each family exhibits small
amount of variation, which corresponds to
phone-home/connect-back IP addresses encoded in
the payload for the victims to connect to a
specified host for downloading additional codes
or files
Connect-back refers to connecting to the victims
immediate parent in the infection chain
Phone-home refers to connecting to a central
location

23
Analysis of Exploit DiversityISystemActivator
(1)

Malware Blaster worm
First noticed on 11.8.2003 infected hundreds of
thousands of computers within the first 24 hours,
and several millions more in the following few
months
Exploited a buffer overflow in the RPC service of
Windows 2000 and XP
By creating a DDoS attack against MSs
windowsupdate.com
The worm contains a hidden string, which reads
billy gates why do you make this possible? Stop
making money and fix your software!!
Patche was available 1 month earlier than the
release of the worm
Written by an 18 years old US resident arrested
and sentenced to an 18-month prison term

(Source Wikipedia)
24
Analysis of Exploit DiversityISystemActivator
(2)
Histogram of shellcode instance

1561 exploit samples were collected
90 distinct payload were identified
10 variations responsible for most of the
observed exploits while 80 distinct shellcodes
appearing only once

25
Analysis of Exploit DiversityISystemActivator
(3)
Dendrogram

Most cluster merges happened below a distance of
10, use this distance value as the threshold to
define families among the exploits
6 families of shellcodes can be identified
The low initial threshold distance of 10 and the
large gap between cluster merges at distance of
85 indicate that exploits within a family are
similar, but vary substantially between families

26
Analysis of Exploit DiversityISystemActivator
(4)
Dendrogram

Manual examination of the shellcodes confirmed
that the clusters reflected 6 different code
bases
Slight differences among exploits within each
family due to variations in data constants
Relatively low 10 exedit distance between ISys-2
and ISys-3 implied a close relationship

27
Analysis of Exploit DiversityISystemActivator
(5)
Dendrogram
Evolution diagram

Only difference was that ISys-3 contained a
connect, but ISys-2 contained a bind, listen, and
accept believe that these two families were
derived from the same code base except that
ISys-3 required the newly-infected host to
connect back to the infecting host, while
ISys-2 required the newly-infected host to bind
on a socket and wait for a connection attempt
from the infecting host

28
Analysis of Exploit DiversityRemoteActivation
(1)
Histogram of shellcode instance

Malware Blaster worm
RemoteActivation was the original MS RPC
vulnerability that Blaster and its variants
exploited before also targeting ISystemActivator
338 distinct exploit payloads were identified
each exploit attempt used a unique payload

29
Analysis of Exploit DiversityRemoteActivation
(2)
Dendrogram

Exedit distance among the shellcodes was very
small most cluster merges occur below a distance
of 1
Use this value as threshold results in 2 distinct
families the 1.3 interfamily exedit distance
indicates that the families are closely related

30
Analysis of Exploit DiversityRemoteActivation
(3)
Dendrogram

Manual examination of the shellcodes reveals that
the last third of the payload contained randomly
generated characters which accounted for the
variation within each family
Two very similar but functionally different types
of RemoteActivation exploits in the trace 10
belonged to Remact-0, the bind version, while the
other 90 belonged to family Remact-1, the
connect-back version
All payloads shared the same prefix which
resembles part of the Metasploit Framework but
cannot be proofed (Metasploit is a toolkit for
generating exploits, and includes options for
generating encoded shellcodes and random filler
characters)

31
NIDS vs Polymorphism

To what extent exploit polymorphism will limit
the effectiveness of Network Intrusion Detection
Systems (NIDS)?
Tried to generate the signatures required to
exhaustively cover all exploits observed for each
vulnerability in the DSL residential trace
For each individual vulnerability except LSASS,
one signature sufficed to cover the set of
exploits the size of each signature is 100 bytes
Tested the signatures against a 5-GB trace of
network traffic and none of the signatures
yielded false positives
The results indicate that polymorphism was not
effective for evading detection

32
Factors Driving the Evolution

Having reviewed the relationship between
different pieces of malware, but what are the
factors that drive the structural and functional
evolution of malware?
Two hypotheses are
The malware authors wish to use polymorphism to
prevent the malware from being caught by NIDS
signatures (perhaps they do not realize that
their polymorphism was ineffective against
evasion), or
Todays polymorphism is unrelated to evading NIDS
signatures the variation in shellcodes was due
to functional variation (e.g., the bind and
connect-back varieties)

33
Observations (1)

About 4,500 samples of exploits were collected in
a DSL connection in 2-days time it indicates
that once a computer is connected to the
Internet, it is exposed to huge amount of malware
attacks (an attack every 40 seconds)
For all the Microsoft vulnerabilities studied in
the paper, Micorsoft had in fact released the
relevant patches before the exploit attacks were
first launched
Users should be able protect their machines from
such attacks if patches for the vulnerabilities
are applied promptly
The public announcement of patch releases by
Microsoft advertises the existence of
vulnerability to the malware authors, who can
perform reverse engineering on the patch to
discover the vulnerability and write the malware

34
Observations (2)

Identification of exploit families based on
cluster merges threshold seems arbitrary
choosing a different threshold value will result
in different number of families and their
compositions
Though the exploit families can be verified by
manual examination of the shellcodes, such
methodology may not be appropriate if the samples
involved are in the magnitude of millions not
scalable
Simple relationships are built for some shellcode
instances the relationships of the other
shellcode instances remain unknown complete
family tree (phylogeny) cannot be built
Unlike the relationships of organisms
correctness of the constructed shellcode
phylogeny is difficult to prove
Recommend to repeat the research using other
datamining techniques and distance metrics to see
their effects on the resulting exploit families