Transcript and Presenter's Notes

Title: Finding Diversity in Remote Code Injection Exploits

Finding Diversity inRemote Code Injection
University of California, San DiegoJustin Ma,
Stefan Savage, Geoffrey M. VoelkerandMicrosoft
ResearchJohn Dunagan, Helen J. Wang
  • Presented by
  • Kenneth Poon Fai Yiu
  • 2.4.2007

  • Introduction / Objectives
  • Benefits of Malware Family Tree
  • A Remote Code Injection Attack
  • Shellcode
  • Methodology for Measuring Diversity
  • Analysis of Exploit Diversity
  • NIDS vs Polymorphism
  • Factors Driving the Evolution
  • Observations

  • Internet users are facing with increasing threats
    of online crimes due to the presence of numerous
    malware running on the Internet
  • Previous studies were focused on methods for
    defending against such attacks
  • Few researches have been done on the malware
    ecosystem, such as
  • the relationship between different pieces of
  • the factors that drive the structural and
    functional evolution of malware

  • To develop a measurement methodology for
    identifying and measuring the diversity among
    remote code injection exploits
  • Use the measured data to
  • understand the diversity of todays malware, and
  • construct a shellcode phylogeny (i.e. a malware
    family tree) for selected vulnerabilities

Benefits of Malware Family Tree
  • Simplify the categorization and analysis of
  • Provide insight into the factors influencing
    malware development and evolution
  • Help in estimating the market-share and vigor of
    different cyber-criminal organizations

  • Vulnerability
  • A system bug or design flaw allowing an attacker
    to misuse an application (e.g. executing commands
    on the system)
  • Malware (Malicious Software)
  • Software designed to infiltrate or damage a
    computer system, e.g. computer viruses, worms,
    spyware and adware
  • Exploit
  • Software that attack a vulnerability of a system
    in order to gain control of it
  • A remote exploit is an exploit that works over
    a network
  • A kind of malware use interchangeably with
    malware throughout the presentation
  • Code injection
  • A technique to add codes into a computer program
    to modify its functionality
  • Shell code
  • A piece of machine code used as the payload of an
  • May contain mechanism to avoid detection by
    detection by anti-intrusion system
  • Phylogeny
  • A biological term - the study of evolutionary
    relationship among organisms
  • The classification of exploits according to their
    relationship in the evolutionary history

A Remote Code Injection Attack
Second, there is a computer with Internet
connection installed with this software without
applying any patch
Third, the malware attacks the computer by
injecting exploit code (shellcode, data and
random character fillers) to the vulnerability
First, there exists a software (e.g. MS Window
XP) with vulnerability (e.g. a stack based buffer
overflow) and a corresponding malware targeted
for such vulnerability
A Remote Code Injection Attack
Exploit Packets
Fourth, the codes overwrite the data in the
buffer beyond the boundaries and changes the
contents of memory location adjacent to the
buffer which may be used by other buffers and
variables. If the buffer is a stack-based buffer,
the return address of the calling function can
also be changed (e.g. to the address of the
Fifth, the exploit gains control of the computer
and executes the shellcode
Sixth, the shellcode may (1) download additional
software to the computer, (2) join a centralized
botnet or (3) reconfigure the operating system
to evade detection
  • Small, simple, hand-coded machine programs
  • Initial payload of an exploit that first executes
    on a newly compromised machine
  • Polymorphism (variation in the style of
  • May be encrypted and only decrypted just before
  • XOR encoding is a commonly used encoding scheme
  • May contain anti-debugging code (including
    self-modifying code) to complicate disassembly
    and analysis of the shellcode

Methodology for Measuring Diversity
  • Exploit collection
  • (To collect exploit samples)
  • ?
  • Extracting shellcodes
  • (To extract shellcodes from the collected exploit
  • ?
  • Exploit emulation
  • (To run the extracted shellcodes to retrieve the
    instruction code bytes )
  • ?
  • Clustering
  • (To group the instructions code bytes into

Methodology for Measuring DiversityExploit
  • Examine network traces of traffic using a
    fully-patched Windows XP computer connected to a
    residential DSL network
  • Capture exploit attempts from the DSL network to
    4 well-known vulnerabilities for 2 days starting
    from 6/9/2006 500 pm
  • The 4 vulnerabilities are
  • SQL Name Resolution (Slammer)
  • LSASS (Sasser)
  • MS RPC IsystemActivator (Blaster)
  • MS RPC RemoteActivation (Blaster)

Methodology for Measuring DiversityExtracting
  • Extract shellcodes directly from the collected
    network trace using Shield
  • Shield
  • A tool originally designed for filtering exploits
    for known vulnerabilities
  • But modified to collect data that is beyond the
    buffer boundary

Methodology for Measuring DiversityExploit
  • Most shellcodes are encrypted decoding is needed
    to reveal the actual executable code
  • The solution is restricted binary emulation, i.e.
    allowing the exploit decoding routines to execute
    in order to reveal the actual instruction codes
  • Implement the emulator on a Linux platform
  • Load an encoded shellcode, declare it as a
    statically allocated buffer, treat the buffer as
    a function and allow it to run
  • Overcome the issue with non-executable prefixes
    by iteratively retrying failed emulations at
    subsequent offsets
  • Mark the executed instruction bytes for later
  • Emulation stops when the control flow makes an
    absolute jump to a location outside the buffer

Methodology for Measuring DiversityClustering
  • A datamining technique for grouping objects with
    similar characteristics
  • Perform clustering on the shellcode instruction
    bytes using exedit distance - a metric for
    measuring the similarity between 2 sets of
    shellcode instruction bytes generated by binary
  • Construct a dendrogram to visualize the
    clustering results
  • Evaluate the resulting clusters manually to
    confirm the constructed family tree is a sensible
    representation of the phylogeny of the exploit

Methodology for Measuring DiversityClustering
  • Exedit Distance
  • Relative edit distance over the shellcode
    instruction bytes, which is the number of edit
    operations (insertion, deletion, substitution)
    used to transform one string to another
  • For each sample,
  • Mark the executed instruction bytes
  • Concatenate the marked bytes in the order they
    appear in the payload (i.e., memory order) to
    construct a string representation
  • Compress each consecutive run of the NOP (No
    operation) instructions into one single NOP
  • Compute the relative edit distance over all
    exploits using these strings

Analysis of Exploit DiversitySQL Name
Resolution (1)
  • Malware Slammer worm
  • First noticed on 25.1.2003 infected 75,000
    computers in 10 minutes
  • Exploited two buffer overflow bugs in Microsoft's
    SQL Server and Desktop Engine database products
  • By generating random IP addresses and send itself
    out to those addresses
  • Dramatically slowed down general Internet traffic
  • Patch was available six months before the worms
    first launch

Analysis of Exploit DiversitySQL Name
Resolution (2)
  • 767 exploit samples were collected
  • 2 apparent variations of Slammer were detected
  • 766 exploits with the exact same payload and 1
  • The outlier was identical to all the other
    payloads except for the last 91 bytes evidence
    shows that the payload was likely corrupted on
    the network before being captured in the trace
  • By discarding the outlier sample, there was only
    1 Slammer exploit in the DSL trace so, no
    exploit diversity

Analysis of Exploit DiversityLSASS (1)
  • Malware Sasser worm
  • First noticed on 30.4.2004 disrupted operations
    for airlines, banks, and government offices
  • Exploited a buffer vulnerability in LSASS (Local
    Security Authority Subsystem Service) of MS
    Windows 2000 and XP
  • By scanning different ranges of IP addresses and
    connects to victims computers primarily through
    TCP port 139 or 445
  • Patch was available in 4.2004, prior to the
    release of the worm
  • Written by a 18 years old CS student in Germany
    arrested and received a 21 month suspended

Analysis of Exploit DiversityLSASS (2)
Histogram of shellcode instance
  • 1769 exploit samples were collected
  • 56 distinct payload were identified

Analysis of Exploit DiversityLSASS (3)
  • Each x-axis position represents a unique
  • The y-axis shows relative edit distance
  • A horizontal line segment at y-axis value y
    indicates that two sub-clusters had cluster
    distance y when they were merged into one

Analysis of Exploit DiversityLSASS (4)
  • Most cluster merges occurred at a small exedit
    distance of 10 use 10 as threshold for
    defining families among the exploits
  • 5 families of shellcodes can be identified
  • Manual examination of the shellcodes concluded
    that the identified families were indeed 5
    separate code bases
  • LSASS-2, 3 and 4 had sufficient similarity to
    conclude that they were evolved of the same code

Analysis of Exploit DiversityLSASS (5)
Evolution diagram
  • Shellcodes within each family exhibits small
    amount of variation, which corresponds to
    phone-home/connect-back IP addresses encoded in
    the payload for the victims to connect to a
    specified host for downloading additional codes
    or files
  • Connect-back refers to connecting to the victims
    immediate parent in the infection chain
  • Phone-home refers to connecting to a central

Analysis of Exploit DiversityISystemActivator
  • Malware Blaster worm
  • First noticed on 11.8.2003 infected hundreds of
    thousands of computers within the first 24 hours,
    and several millions more in the following few
  • Exploited a buffer overflow in the RPC service of
    Windows 2000 and XP
  • By creating a DDoS attack against MSs
  • The worm contains a hidden string, which reads
    billy gates why do you make this possible? Stop
    making money and fix your software!!
  • Patche was available 1 month earlier than the
    release of the worm
  • Written by an 18 years old US resident arrested
    and sentenced to an 18-month prison term

Analysis of Exploit DiversityISystemActivator
Histogram of shellcode instance
  • 1561 exploit samples were collected
  • 90 distinct payload were identified
  • 10 variations responsible for most of the
    observed exploits while 80 distinct shellcodes
    appearing only once

Analysis of Exploit DiversityISystemActivator
  • Most cluster merges happened below a distance of
    10, use this distance value as the threshold to
    define families among the exploits
  • 6 families of shellcodes can be identified
  • The low initial threshold distance of 10 and the
    large gap between cluster merges at distance of
    85 indicate that exploits within a family are
    similar, but vary substantially between families

Analysis of Exploit DiversityISystemActivator
  • Manual examination of the shellcodes confirmed
    that the clusters reflected 6 different code
  • Slight differences among exploits within each
    family due to variations in data constants
  • Relatively low 10 exedit distance between ISys-2
    and ISys-3 implied a close relationship

Analysis of Exploit DiversityISystemActivator
Evolution diagram
  • Only difference was that ISys-3 contained a
    connect, but ISys-2 contained a bind, listen, and
    accept believe that these two families were
    derived from the same code base except that
  • ISys-3 required the newly-infected host to
    connect back to the infecting host, while
  • ISys-2 required the newly-infected host to bind
    on a socket and wait for a connection attempt
    from the infecting host

Analysis of Exploit DiversityRemoteActivation
Histogram of shellcode instance
  • Malware Blaster worm
  • RemoteActivation was the original MS RPC
    vulnerability that Blaster and its variants
    exploited before also targeting ISystemActivator
  • 338 distinct exploit payloads were identified
    each exploit attempt used a unique payload

Analysis of Exploit DiversityRemoteActivation
  • Exedit distance among the shellcodes was very
    small most cluster merges occur below a distance
    of 1
  • Use this value as threshold results in 2 distinct
    families the 1.3 interfamily exedit distance
    indicates that the families are closely related

Analysis of Exploit DiversityRemoteActivation
  • Manual examination of the shellcodes reveals that
    the last third of the payload contained randomly
    generated characters which accounted for the
    variation within each family
  • Two very similar but functionally different types
    of RemoteActivation exploits in the trace 10
    belonged to Remact-0, the bind version, while the
    other 90 belonged to family Remact-1, the
    connect-back version
  • All payloads shared the same prefix which
    resembles part of the Metasploit Framework but
    cannot be proofed (Metasploit is a toolkit for
    generating exploits, and includes options for
    generating encoded shellcodes and random filler

NIDS vs Polymorphism
  • To what extent exploit polymorphism will limit
    the effectiveness of Network Intrusion Detection
    Systems (NIDS)?
  • Tried to generate the signatures required to
    exhaustively cover all exploits observed for each
    vulnerability in the DSL residential trace
  • For each individual vulnerability except LSASS,
    one signature sufficed to cover the set of
    exploits the size of each signature is 100 bytes
  • Tested the signatures against a 5-GB trace of
    network traffic and none of the signatures
    yielded false positives
  • The results indicate that polymorphism was not
    effective for evading detection

Factors Driving the Evolution
  • Having reviewed the relationship between
    different pieces of malware, but what are the
    factors that drive the structural and functional
    evolution of malware?
  • Two hypotheses are
  • The malware authors wish to use polymorphism to
    prevent the malware from being caught by NIDS
    signatures (perhaps they do not realize that
    their polymorphism was ineffective against
    evasion), or
  • Todays polymorphism is unrelated to evading NIDS
    signatures the variation in shellcodes was due
    to functional variation (e.g., the bind and
    connect-back varieties)

Observations (1)
  • About 4,500 samples of exploits were collected in
    a DSL connection in 2-days time it indicates
    that once a computer is connected to the
    Internet, it is exposed to huge amount of malware
    attacks (an attack every 40 seconds)
  • For all the Microsoft vulnerabilities studied in
    the paper, Micorsoft had in fact released the
    relevant patches before the exploit attacks were
    first launched
  • Users should be able protect their machines from
    such attacks if patches for the vulnerabilities
    are applied promptly
  • The public announcement of patch releases by
    Microsoft advertises the existence of
    vulnerability to the malware authors, who can
    perform reverse engineering on the patch to
    discover the vulnerability and write the malware

Observations (2)
  • Identification of exploit families based on
    cluster merges threshold seems arbitrary
    choosing a different threshold value will result
    in different number of families and their
  • Though the exploit families can be verified by
    manual examination of the shellcodes, such
    methodology may not be appropriate if the samples
    involved are in the magnitude of millions not
  • Simple relationships are built for some shellcode
    instances the relationships of the other
    shellcode instances remain unknown complete
    family tree (phylogeny) cannot be built
  • Unlike the relationships of organisms
    correctness of the constructed shellcode
    phylogeny is difficult to prove
  • Recommend to repeat the research using other
    datamining techniques and distance metrics to see
    their effects on the resulting exploit families

  • End
