Title: Finding Diversity in Remote Code Injection Exploits
1Finding Diversity inRemote Code Injection
Exploits
University of California, San DiegoJustin Ma,
Stefan Savage, Geoffrey M. VoelkerandMicrosoft
ResearchJohn Dunagan, Helen J. Wang
- Presented by
- Kenneth Poon Fai Yiu
- 2.4.2007
2Outline
- Introduction / Objectives
- Benefits of Malware Family Tree
- A Remote Code Injection Attack
- Shellcode
- Methodology for Measuring Diversity
- Analysis of Exploit Diversity
- NIDS vs Polymorphism
- Factors Driving the Evolution
- Observations
3Introduction
- Internet users are facing with increasing threats
of online crimes due to the presence of numerous
malware running on the Internet - Previous studies were focused on methods for
defending against such attacks - Few researches have been done on the malware
ecosystem, such as - the relationship between different pieces of
malware - the factors that drive the structural and
functional evolution of malware
4Objectives
- To develop a measurement methodology for
identifying and measuring the diversity among
remote code injection exploits - Use the measured data to
- understand the diversity of todays malware, and
- construct a shellcode phylogeny (i.e. a malware
family tree) for selected vulnerabilities
5Benefits of Malware Family Tree
- Simplify the categorization and analysis of
malware - Provide insight into the factors influencing
malware development and evolution - Help in estimating the market-share and vigor of
different cyber-criminal organizations
6Glossary
- Vulnerability
- A system bug or design flaw allowing an attacker
to misuse an application (e.g. executing commands
on the system) - Malware (Malicious Software)
- Software designed to infiltrate or damage a
computer system, e.g. computer viruses, worms,
spyware and adware - Exploit
- Software that attack a vulnerability of a system
in order to gain control of it - A remote exploit is an exploit that works over
a network - A kind of malware use interchangeably with
malware throughout the presentation - Code injection
- A technique to add codes into a computer program
to modify its functionality - Shell code
- A piece of machine code used as the payload of an
exploit - May contain mechanism to avoid detection by
detection by anti-intrusion system - Phylogeny
- A biological term - the study of evolutionary
relationship among organisms - The classification of exploits according to their
relationship in the evolutionary history
(Source Wikipedia)
7A Remote Code Injection Attack
?
?
Second, there is a computer with Internet
connection installed with this software without
applying any patch
Third, the malware attacks the computer by
injecting exploit code (shellcode, data and
random character fillers) to the vulnerability
First, there exists a software (e.g. MS Window
XP) with vulnerability (e.g. a stack based buffer
overflow) and a corresponding malware targeted
for such vulnerability
8A Remote Code Injection Attack
Exploit Packets
Fourth, the codes overwrite the data in the
buffer beyond the boundaries and changes the
contents of memory location adjacent to the
buffer which may be used by other buffers and
variables. If the buffer is a stack-based buffer,
the return address of the calling function can
also be changed (e.g. to the address of the
shellcode)
Fifth, the exploit gains control of the computer
and executes the shellcode
Sixth, the shellcode may (1) download additional
software to the computer, (2) join a centralized
botnet or (3) reconfigure the operating system
to evade detection
9Shellcode
- Small, simple, hand-coded machine programs
- Initial payload of an exploit that first executes
on a newly compromised machine - Polymorphism (variation in the style of
construction) - May be encrypted and only decrypted just before
execution - XOR encoding is a commonly used encoding scheme
- May contain anti-debugging code (including
self-modifying code) to complicate disassembly
and analysis of the shellcode
10Methodology for Measuring Diversity
- Exploit collection
- (To collect exploit samples)
- ?
- Extracting shellcodes
- (To extract shellcodes from the collected exploit
samples) - ?
- Exploit emulation
- (To run the extracted shellcodes to retrieve the
instruction code bytes ) - ?
- Clustering
- (To group the instructions code bytes into
families)
11Methodology for Measuring DiversityExploit
Collection
- Examine network traces of traffic using a
fully-patched Windows XP computer connected to a
residential DSL network - Capture exploit attempts from the DSL network to
4 well-known vulnerabilities for 2 days starting
from 6/9/2006 500 pm - The 4 vulnerabilities are
- SQL Name Resolution (Slammer)
- LSASS (Sasser)
- MS RPC IsystemActivator (Blaster)
- MS RPC RemoteActivation (Blaster)
12Methodology for Measuring DiversityExtracting
Shellcodes
- Extract shellcodes directly from the collected
network trace using Shield - Shield
- A tool originally designed for filtering exploits
for known vulnerabilities - But modified to collect data that is beyond the
buffer boundary
13Methodology for Measuring DiversityExploit
Emulation
- Most shellcodes are encrypted decoding is needed
to reveal the actual executable code - The solution is restricted binary emulation, i.e.
allowing the exploit decoding routines to execute
in order to reveal the actual instruction codes - Implement the emulator on a Linux platform
- Load an encoded shellcode, declare it as a
statically allocated buffer, treat the buffer as
a function and allow it to run - Overcome the issue with non-executable prefixes
by iteratively retrying failed emulations at
subsequent offsets - Mark the executed instruction bytes for later
analysis - Emulation stops when the control flow makes an
absolute jump to a location outside the buffer
14Methodology for Measuring DiversityClustering
(1)
- A datamining technique for grouping objects with
similar characteristics - Perform clustering on the shellcode instruction
bytes using exedit distance - a metric for
measuring the similarity between 2 sets of
shellcode instruction bytes generated by binary
emulation - Construct a dendrogram to visualize the
clustering results - Evaluate the resulting clusters manually to
confirm the constructed family tree is a sensible
representation of the phylogeny of the exploit
families
15Methodology for Measuring DiversityClustering
(2)
- Exedit Distance
- Relative edit distance over the shellcode
instruction bytes, which is the number of edit
operations (insertion, deletion, substitution)
used to transform one string to another - For each sample,
- Mark the executed instruction bytes
- Concatenate the marked bytes in the order they
appear in the payload (i.e., memory order) to
construct a string representation - Compress each consecutive run of the NOP (No
operation) instructions into one single NOP
instruction - Compute the relative edit distance over all
exploits using these strings
16Analysis of Exploit DiversitySQL Name
Resolution (1)
- Malware Slammer worm
- First noticed on 25.1.2003 infected 75,000
computers in 10 minutes - Exploited two buffer overflow bugs in Microsoft's
SQL Server and Desktop Engine database products - By generating random IP addresses and send itself
out to those addresses - Dramatically slowed down general Internet traffic
- Patch was available six months before the worms
first launch
(Source Wikipedia)
17Analysis of Exploit DiversitySQL Name
Resolution (2)
- 767 exploit samples were collected
- 2 apparent variations of Slammer were detected
- 766 exploits with the exact same payload and 1
outlier - The outlier was identical to all the other
payloads except for the last 91 bytes evidence
shows that the payload was likely corrupted on
the network before being captured in the trace - By discarding the outlier sample, there was only
1 Slammer exploit in the DSL trace so, no
exploit diversity
18Analysis of Exploit DiversityLSASS (1)
- Malware Sasser worm
- First noticed on 30.4.2004 disrupted operations
for airlines, banks, and government offices
globally - Exploited a buffer vulnerability in LSASS (Local
Security Authority Subsystem Service) of MS
Windows 2000 and XP - By scanning different ranges of IP addresses and
connects to victims computers primarily through
TCP port 139 or 445 - Patch was available in 4.2004, prior to the
release of the worm - Written by a 18 years old CS student in Germany
arrested and received a 21 month suspended
sentence
(Source Wikipedia)
19Analysis of Exploit DiversityLSASS (2)
Histogram of shellcode instance
- 1769 exploit samples were collected
- 56 distinct payload were identified
20Analysis of Exploit DiversityLSASS (3)
Dendrogram
- Each x-axis position represents a unique
shellcode - The y-axis shows relative edit distance
- A horizontal line segment at y-axis value y
indicates that two sub-clusters had cluster
distance y when they were merged into one
cluster
21Analysis of Exploit DiversityLSASS (4)
Dendrogram
- Most cluster merges occurred at a small exedit
distance of 10 use 10 as threshold for
defining families among the exploits - 5 families of shellcodes can be identified
- Manual examination of the shellcodes concluded
that the identified families were indeed 5
separate code bases - LSASS-2, 3 and 4 had sufficient similarity to
conclude that they were evolved of the same code
base
22Analysis of Exploit DiversityLSASS (5)
Dendrogram
Evolution diagram
- Shellcodes within each family exhibits small
amount of variation, which corresponds to
phone-home/connect-back IP addresses encoded in
the payload for the victims to connect to a
specified host for downloading additional codes
or files - Connect-back refers to connecting to the victims
immediate parent in the infection chain - Phone-home refers to connecting to a central
location
23Analysis of Exploit DiversityISystemActivator
(1)
- Malware Blaster worm
- First noticed on 11.8.2003 infected hundreds of
thousands of computers within the first 24 hours,
and several millions more in the following few
months - Exploited a buffer overflow in the RPC service of
Windows 2000 and XP - By creating a DDoS attack against MSs
windowsupdate.com - The worm contains a hidden string, which reads
billy gates why do you make this possible? Stop
making money and fix your software!! - Patche was available 1 month earlier than the
release of the worm - Written by an 18 years old US resident arrested
and sentenced to an 18-month prison term
(Source Wikipedia)
24Analysis of Exploit DiversityISystemActivator
(2)
Histogram of shellcode instance
- 1561 exploit samples were collected
- 90 distinct payload were identified
- 10 variations responsible for most of the
observed exploits while 80 distinct shellcodes
appearing only once
25Analysis of Exploit DiversityISystemActivator
(3)
Dendrogram
- Most cluster merges happened below a distance of
10, use this distance value as the threshold to
define families among the exploits - 6 families of shellcodes can be identified
- The low initial threshold distance of 10 and the
large gap between cluster merges at distance of
85 indicate that exploits within a family are
similar, but vary substantially between families
26Analysis of Exploit DiversityISystemActivator
(4)
Dendrogram
- Manual examination of the shellcodes confirmed
that the clusters reflected 6 different code
bases - Slight differences among exploits within each
family due to variations in data constants - Relatively low 10 exedit distance between ISys-2
and ISys-3 implied a close relationship
27Analysis of Exploit DiversityISystemActivator
(5)
Dendrogram
Evolution diagram
- Only difference was that ISys-3 contained a
connect, but ISys-2 contained a bind, listen, and
accept believe that these two families were
derived from the same code base except that - ISys-3 required the newly-infected host to
connect back to the infecting host, while - ISys-2 required the newly-infected host to bind
on a socket and wait for a connection attempt
from the infecting host
28Analysis of Exploit DiversityRemoteActivation
(1)
Histogram of shellcode instance
- Malware Blaster worm
- RemoteActivation was the original MS RPC
vulnerability that Blaster and its variants
exploited before also targeting ISystemActivator - 338 distinct exploit payloads were identified
each exploit attempt used a unique payload
29Analysis of Exploit DiversityRemoteActivation
(2)
Dendrogram
- Exedit distance among the shellcodes was very
small most cluster merges occur below a distance
of 1 - Use this value as threshold results in 2 distinct
families the 1.3 interfamily exedit distance
indicates that the families are closely related
30Analysis of Exploit DiversityRemoteActivation
(3)
Dendrogram
- Manual examination of the shellcodes reveals that
the last third of the payload contained randomly
generated characters which accounted for the
variation within each family - Two very similar but functionally different types
of RemoteActivation exploits in the trace 10
belonged to Remact-0, the bind version, while the
other 90 belonged to family Remact-1, the
connect-back version - All payloads shared the same prefix which
resembles part of the Metasploit Framework but
cannot be proofed (Metasploit is a toolkit for
generating exploits, and includes options for
generating encoded shellcodes and random filler
characters)
31NIDS vs Polymorphism
- To what extent exploit polymorphism will limit
the effectiveness of Network Intrusion Detection
Systems (NIDS)? - Tried to generate the signatures required to
exhaustively cover all exploits observed for each
vulnerability in the DSL residential trace - For each individual vulnerability except LSASS,
one signature sufficed to cover the set of
exploits the size of each signature is 100 bytes - Tested the signatures against a 5-GB trace of
network traffic and none of the signatures
yielded false positives - The results indicate that polymorphism was not
effective for evading detection
32Factors Driving the Evolution
- Having reviewed the relationship between
different pieces of malware, but what are the
factors that drive the structural and functional
evolution of malware? - Two hypotheses are
- The malware authors wish to use polymorphism to
prevent the malware from being caught by NIDS
signatures (perhaps they do not realize that
their polymorphism was ineffective against
evasion), or - Todays polymorphism is unrelated to evading NIDS
signatures the variation in shellcodes was due
to functional variation (e.g., the bind and
connect-back varieties)
33Observations (1)
- About 4,500 samples of exploits were collected in
a DSL connection in 2-days time it indicates
that once a computer is connected to the
Internet, it is exposed to huge amount of malware
attacks (an attack every 40 seconds) - For all the Microsoft vulnerabilities studied in
the paper, Micorsoft had in fact released the
relevant patches before the exploit attacks were
first launched - Users should be able protect their machines from
such attacks if patches for the vulnerabilities
are applied promptly - The public announcement of patch releases by
Microsoft advertises the existence of
vulnerability to the malware authors, who can
perform reverse engineering on the patch to
discover the vulnerability and write the malware
34Observations (2)
- Identification of exploit families based on
cluster merges threshold seems arbitrary
choosing a different threshold value will result
in different number of families and their
compositions - Though the exploit families can be verified by
manual examination of the shellcodes, such
methodology may not be appropriate if the samples
involved are in the magnitude of millions not
scalable - Simple relationships are built for some shellcode
instances the relationships of the other
shellcode instances remain unknown complete
family tree (phylogeny) cannot be built - Unlike the relationships of organisms
correctness of the constructed shellcode
phylogeny is difficult to prove - Recommend to repeat the research using other
datamining techniques and distance metrics to see
their effects on the resulting exploit families
35