Modeling the Spread of Worms - PowerPoint PPT Presentation

About This Presentation

Title:

Modeling the Spread of Worms

Description:

Metadata allows us to forward packets when we want. E.g. letters at a post office headed for main post office. address labels allow us to forward them in batches ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 18

Provided by: wadet

Learn more at: https://www.winlab.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: Modeling the Spread of Worms

1
Modeling the Spread of Worms

Wade Trappe

2
Overview

Quick discussion of how the Internet is
organized.
Random Constant Spread (RCS) Model and Code-Red I
The Differential Equation
Solving it!
Observations
Improvements in worm design
Scanning Strategies

3
Internet Overview, pg.1

The Internet started as a research project
connecting 4 computers in 1969, and has grown to
connect over 100 million machines.
The Internet is
A loose collection of networks organized into a
hierarchy through interconnection technologies.
At the local level machines are connected to each
other (local area network), and to a router.
A router is a special-purpose device that
transfers data to and from the next layer of the
hierarchy.
Loose collection of networks organized into a
multilevel hierarchy
10-100 machines connected to a hub or a router
service providers also provide direct dialup
access
or over a wireless link
10s of routers on a department backbone
10s of department backbones connected to campus
backbone
10s of campus backbones connected to regional
service providers
100s of regional service providers connected by
national backbone
10s of national backbones connected by
international trunks

4
Internet Overview, Conceptual Picture
5
Internet Overview, pg. 2

Question So, I want to send an email, how does
it happen?
Answer We use Addresses, and Route between
Addresses using the Internet Protocol (IP).
Your data is sent via packets, and the Internet
employs a store-and-forward strategy when
delivering them between nodes.
Packets consist of Meta-data (header) and the
data (payload)
Metadata allows us to forward packets when we
want
E.g. letters at a post office headed for main
post office
address labels allow us to forward them in
batches

6
Internet Overview, pg. 3

Internet addresses are called IP addresses
Refer to a host interface (device connecting the
computer to the network) need one IP address per
interface
Addresses are structured as a two-part hierarchy
network number
host number
Question How many bits to assign to host number
and how many to network number?
If many networks, each with a few hosts, then
more bits to network number
And vice versa
In the end, IP addresses consist of three sets of
partitions of bits
class A 8 bits network, 24 bits host
class B 16 bits each
class C 24 bits network, 8 bits host
Routing uses these addresses to deliver from a
source to a destination.

7
Internet Overview, pg. 4

An example of a message route

traceroute henna.iitd.ernet.in
traceroute to henna.iitd.ernet.in
(202.141.64.30), 30 hops max, 40 byte packets
1 UPSON2-NP.CIT.CORNELL.EDU (128.84.154.1) 1
ms 1 ms 1 ms
2 HOL1-MSS.CIT.CORNELL.EDU (132.236.230.189) 2
ms 3 ms 2 ms
3 CORE1-MSS.CIT.CORNELL.EDU (128.253.222.1) 2
ms 2 ms 2 ms
4 CORNELLNET1.CIT.CORNELL.EDU (132.236.100.10)
4 ms 3 ms 4 ms
5 ny-ith-1-H1/0-T3.nysernet.net (169.130.61.9)
5 ms 5 ms 4 ms
6 ny-ith-2-F0/0.nysernet.net (169.130.60.2) 4
ms 4 ms 3 ms
7 ny-pen-1-H3/0-T3.nysernet.net (169.130.1.121)
21 ms 19 ms 16 ms
8 sl-pen-21-F6/0/0.sprintlink.net
(144.228.60.21) 16 ms 40 ms 36 ms
9 core4-hssi5-0.WestOrange.mci.net
(206.157.77.105) 20 ms 20 ms 24 ms
10 core2.WestOrange.mci.net (204.70.4.185) 21
ms 34 ms 26 ms
11 border7-fddi-0.WestOrange.mci.net
(204.70.64.51) 21 ms 21 ms 21 ms
12 vsnl-poone-512k.WestOrange.mci.net
(204.70.71.90) 623 ms 639 ms 621 ms
13 202.54.13.170 (202.54.13.170) 628 ms 629 ms
628 ms
14 144.16.60.2 (144.16.60.2) 1375 ms 1349 ms
1343 ms
15 henna.iitd.ernet.in (202.141.64.30) 1380 ms
1405 ms 1368 ms

8
Now Back to Worms

Someone who controls many nodes on the Internet
can cause serious damage to the Internet.
It is reasonable to gain control of millions of
Internet hosts through worms.
Worms differ from viruses in that worms do not
require human intervention to propagate. Viruses
require user action (aka. Clicking that email
attachment).
Pandurang gave the overview of Worms, along with
its history in the previous lecture.
We will start with Code Red

9
Code Red

The Code Red Worm was initially released in July
2001.
The worm spread by compromising Microsoft web
servers using a vulnerability that had been
discovered just a few weeks earlier.
Once a host was infected, Code Red would spread
itself by launching 99 threads, that each
generated a random IP address and tried to infect
that address using the same vulnerability.
Initial version of Code Red, CRv1, had a bug in
the random number generator.
Second version of Code Red, CRv2, the bug was
fixed. CRv2 contained a piece of code to perform
a distributed denial of service attack on
www.whitehouse.gov.

10
Random Constant Spread, pg. 1

Code Red spread very rapidly at first, until
almost all vulnerable machines were compromised,
then it seemed to slow down its spread.
The Random Constant Spread (RCS) is one model to
describe this phenomenon.
Let N total of vulnerable servers which can be
corrupted/infected (assume its constant with
time)
Let K initial compromise rate
i.e. the number of vulnerable machines that an
infected host can find and compromise at the
start (when few other hosts have been
compromised).
K is some universal constant for a particular
worm.
Assume that a compromised machine picks other
machines at random, and that once a machine is
infected it cannot be compromised again.
Let T be point when half the machines are
infected.
Variables
a the proportion of vulnerable machines that
have been infected (e.g. a1 means all N have
been infected). The variable a will change with
time t.
t time in hours

11
Random Constant Spread, pg. 2

RCS is based upon the idea of logistic growth
The actual growth rate at a time t depends on the
population
Suppose a(t) is the proportion of the N machines
infected at time t, then there are a total of
Na(t) machines that have been infected.
If we go from time t to time (tdt), then a(t)
will become a(tdt)a(t) da.
da represents the change in the proportion a, and
is an infinitesimal quantity (i.e. everything is
in the limit).
So Nda represents the total number of additional
machines that will be infected in dt more time.
Thats one way to calculate the number of
additional machines that can be infected in dt
time, we need one more way.

12
Random Constant Spread, pg. 3

Key Idea Suppose I have 100 machines and I can
infect K of those machines in one hour. Now,
instead, suppose I have 80 machines, then how
many can I infect in one hour?
Answer 0.8 K
Now, suppose Na machines have been infected, then
that leaves (1-a)N machines left.
Question When I had N infectible-machines I
could infect K machines. So, now I have (1-a)N
infectible machines, how many can one machine
infect?
Answer (1-a)K
Next Issue I can infect (1-a)K machines in 1
hour, but what about in dt time? Answer
(1-a)Kdt.
Final Issue At time t I have a(t)N machines that
can do the infecting, so how many will be
infected in time dt?
Simple, but not completely accurate answer
(Na)K(1-a)dt

13
Random Constant Spread, pg. 4

Lets put the two sides together
Nda (Na)K(1-a) dt
So, how do we solve this? Answer Its an easy
first order diffeq.
One way

14
Random Constant Spread, pg. 5

Observations
For small t (before the first infection) there is
no growth, but once the infection happens, growth
happens exponentially.
However, once significantly past T, growth slows
again because we are running out of machines to
infect.
See plot for an example.
These observations were confirmed in the real
worm data.
Several hours before Code Red was due to
terminate itself, it had slowed down due to the
fact it had found the majority of infectible
machines.

15
Random Constant Spread, pg. 6

What was wrong with the RCS model?
Basically, problem lies to the simplification of
the probability involved.
The assumption that if aN machines are infected
then (aN)(1-a)K machines will be infected in next
hour is wrong.
Randomly choosing an address might mean that you
actually try to reinfect an already infected
machine.
Or, by randomly choosing an address, two infected
machines might try to infect the same machine.
Overall, the value of RCS is not its rigor, but
the fact it reveals underlying principles and
dynamics.

16
Better Worm Strategies

Localized Scanning
It takes more time to infect a node further away
than one nearby.
Localized scanning seeks to balance the amount of
attempts a worm takes in infecting a nearby
machine versus choosing a random machine on the
Internet.
Strategy employed in Code Red II.
Hit List Scanning
We saw that worms take a while to get started,
but once started they grow exponentially. How do
we speed up the start?
Idea
Give the initial worm a list of high-potential
targets.
Once it infects a machine on the hit-list, it
splits the hit-list in half and gives half to
child worm to use.
Child worms continue replicating and splitting
hit list.
Advantages hit-list shrinks quickly, initial
spread is very quick.

17
Better Worm Strategies, pg. 2

Permutation Scanning
One limitation of random scanning is that
different nodes may try to infect the same
machine, or infect an already infected machine.
Idea
Each worm gets a starting point of permutation
space to work with.
Permutation Space is mapped to IP Address Space
via a 32 bit cipher (with fixed key).
The worm goes along attempting to infect each
machine in its region of permutation space. If it
ever encounters a machine that has been infected,
it knows that its permutation space will start
overlapping another worms permutation space, so
it chooses a new, random place to start from in
permutation space.
Result Worms end up trying to work on separate
sections of permutation space.
Improvements Enforced partitions of permutation
space.