Use of Measurements in Anomaly Detection - PowerPoint PPT Presentation

About This Presentation
Title:

Use of Measurements in Anomaly Detection

Description:

Use of Measurements in Anomaly Detection CS 8803: Network Measurements Seminar Instructor: Constantinos Dovrolis Fall 2003 Presenter: Bu ra Gedik – PowerPoint PPT presentation

Number of Views:258
Avg rating:3.0/5.0
Slides: 51
Provided by: loca284
Category:

less

Transcript and Presenter's Notes

Title: Use of Measurements in Anomaly Detection


1
Use of Measurements in Anomaly Detection
  • CS 8803 Network Measurements Seminar
  • Instructor Constantinos Dovrolis
  • Fall 2003
  • Presenter Bugra Gedik

2
Outline
  • Well be discussing 3 papers
  • Topic Detail Inferring DoS Activity
  • Paper D. Moore, G. M. Voelker, and S. Savage.
    Inferring internet denial-of-service activity. In
    Proceedings of the USENIX Annual Technical
    Conference (USENIX 2001).
  • Topic Detail Code-Red Worm
  • Paper D. Moore, C. Shanning, and J. Brown.
    Code-Red A Case Study on the Spread and Victims
    of an Internet Worm. In Proceedings of the ACM
    Internet Measurement Workshop (IMW 2002).
  • Topic Detail DoS Attacks and Flash Crowds
  • Paper J. Jung, B. Krishnamurthy, and M.
    Rabinovich. Flash Crowds and Denial of Service
    Attacks Characterization and Implications for
    CDNs and Web Sites. In Proceedings of the
    International World Wide Web Conference (WWW
    2002).

3
  • Inferring Internet Denial-of-Service Activity
  • David Moore
  • Geoffrey M. Voelker
  • Stefan Savage
  • In Proceedings of the USENIX Annual Technical
    Conference (USENIX 2001).

4
Problem Statement Solution Overview
  • Problem
  • How prevalent are denial-of-service attacks in
    the Internet today?
  • This paper only considers flood type of attacks
  • Technique
  • Use backscatter analysis for estimating the
    worldwide prevalence of DoS attacks

5
Backscatter Analysis
6
Some Limiting Assumptions
  • Address uniformity Attackers spoof source
    addresses at random.
  • Reliable delivery Attack traffic is delivered
    reliably to the victim and backscatter is
    delivered reliably to the monitor.
  • Backscatter hypothesis Unsolicited packets
    observed by the monitor represent backscatter.

7
Address uniformity
  • May not hold because
  • Some ISPs employ ingress filtering, as a result
    the attacker may be forced to restrict its
    address space
  • Reflector Attacks A different kind of flooding
    attack that is not captured by backscattering,
    e.g. Smurf or Fraggle attacks
  • The main motivation of the assumption
  • Many direct DoS attack tools use random address
    spoofing, e.g. Shaft, TFN, TFN2k, trinoo,
    Stacheldraht, mstream, Trinity
  • It is possible to use tests like A2 to test
    uniformity

Multicast Group
8
Reliable delivery
  • May not hold because
  • During the attack packets may be dropped due to
    congestion
  • IDS may filter the packets
  • Some type of attacks may not produce a
    backscatter
  • Many attacks generate a backscatter
  • Most type of flooding attacks do generate a
    response

9
Backscatter hypothesis
  • May not hold because
  • Any host on the internet can send unsolicited
    packets to the monitored network
  • Motivation of the assumption
  • Packets that are consistently targeted to a
    specific address in the monitored network can be
    filtered easily
  • Although a concerted effort by a third party can
    bias the results, this is quite unlikely

10
Extrapolating Backscatter Analysis Results
  • Let n be the number of monitored IP addresses
  • And consider an attack with m packets
  • Then the expected number of backscatter packets
    observed from the attack, E(X), is E(X)
    (nm)/232
  • Similarly, if the observed rate of an attack is
    R, than an upper bound on the real rate R, is
    R gt R 232 /n

11
Attack Classification
  • Two types of classification are done
  • Flowed based classification
  • Used to classify individual attacks
  • Answering the questions
  • how many
  • how long
  • what kind
  • Event based classification
  • Analyze the severity of attacks on short time
    scales

12
Flow-based classification
  • A flow is defined as a series of consecutive
    packets sharing the same target (victims
    address) and same IP protocol
  • If no more packets are observed from a flow for 5
    minutes, the flow is assumed to end
  • All flows that do not have more than 100 packets
    or last less than 60secs are discarded
  • Flows that are only backscattered to a single IP
    address in the monitored range are discarded

13
Examining the Flows
  • Determine the type of attack by examining
  • TCP flag settings
  • ICMP packets
  • Look at the distributions of
  • IP addresses, use A2 uniformity test to validate
    the assumption, significance level of 0.05
  • port addresses
  • Classify the victim by examining
  • DNS information of the victim
  • AS level information of the victim from BGP tables

14
Event-based Classification
  • An attack event is defined by a victim emitting
    at least 10 backscatter packets during a one
    minute period
  • Attacks are not classified based on type, only
    criterion is the victims IP address
  • For each minute, the victims that are under
    attack and the intensity of each attack is
    determined and recorded

15
Experimental Setup
  • /8 network represents 1/256 of the total Internet
  • February 1st to February 25th, Ethernet traffic
    is captured using a shared hub with the ingress
    router

16
Summary of Observed Attacks
  • 5000 distinct victim IP addresses in more than
    2000 distinct DNS domains

17
Attack/Response Protocols
  • 50 of the attacks generate TCP (RST ACK)
    suggesting they are TCP flood attacks destined to
    closed ports
  • 15 of the attacks generate ICMP host
    unreachable containing a TCP header including the
    victims IP again suggesting a TCP flood
  • 12 of the attacks generate ICMP (TTL Exceeded)
    Strange! These we caused by attacks with very
    high rate and they correspond to around 50 of
    all backscatter packets observed
  • 8 of the attacks generate TCP (SYN ACK)
    suggesting SYN floods

18
Attack Rate
  • Uniform Random Attacks are the ones whose source
    IP addresses satisfy the A2 test
  • 500 SYN packets per second are enough to
    overwhelm a server (40 of attacks satisfy this)
  • 14,000 SYN packets per second are enough to
    overwhelm a server with specialized firewalls
    (2.5 of attacks satisfy this)

19
Attack Duration
  • 50 of the attacks are less than 10 minutes
  • 80 of the attacks are less than 30 minutes
  • 90 of the attacks are less than 60 minutes

20
Victim Classification
  • Significant fraction of attacks targeted to home
    machines, either dial-up or broadband
  • Within home users, cable-modem users have
    experienced some intense attacks with rates going
    up to 1,000 packets per second.
  • Significant number of attacks to IRC servers

21
Victim Classification
  • No single AS or a small set of ASs are major
    targets
  • 65 of the victems were attacked once and 18
    twice

22
Validation
  • 98 of the packets attributed to backscatter does
    not itself provoke a response, so they can not be
    packets used to probe the monitored network
  • 98 of the victim IP addresses are also
    encountered in other traces extracted from
    different datasets collected at the same period

23
  • Code-Red A Case Study on the Spread and Victims
    of an Internet Worm
  • David Moore
  • Colleen Shannon
  • Jeffery Brown
  • In Proceedings of the ACM Internet Measurement
    Workshop (IMW 2002)

24
Analysis of the Code-Red Worm
  • Worms Self replicating viruses
  • Code-Red worm classification
  • Code-RedI-v1 memory-resident, static seed,
    infect/spread/attack
  • Code-RedI-v2 memory-resident, random seed,
    infect/spread/attack
  • Code-RedII disk-resident, intelligent,
    infect/backdoor/spread
  • Data Sets
  • Packet header trace of hosts sending unsolicited
    TCP SYN packets to a /8 (class A) network and two
    /16 networks, July 4 / August 21
  • July 12, 2001 - Code-RedI-v1 set loose
  • July 19, 2001 - Code-RedI-v2 set loose
  • August 4, 2001 - Code-RedII set loose
  • Hosts that has sent at least two unsolicited TCP
    SYN packets (on port 80) to the /8 network are
    suspected as infected hosts

25
Code-RedI Worms
From the beginning of 20th to the end of the month
From the beginning to the end of 19th of the month
Infection Phase
Attack Phase
. . .
26
Unsolicited SYN probes, Code-Redv1
  • The trace includes large number of probes to 23
    IP addresses within the monitored /8 network
  • Using the same static seed first 1 million IP
    addresses are generated by reverse engineering
    the worm code
  • Those 23 addresses in deed appear in the
    generated sequence
  • 3 source addresses in the trace do not belong to
    the generated IP addresses, they must be the
    initial hosts infected manually
  • Atlanta, USA
  • Cambridge, USA
  • GuangDong, China

27
Host Infection Rate, Code-Redv2
  • More than 359,000 unique IP addresses are
    infected with the Code-RedI worm within a day
    between midnight of July 19 and July 20.

28
Deactivation rate for Code-Redv1
  • A clear time of day effect is seen from the
    figure
  • Many machines are shut during the night
  • This is an indication that many home and office
    users are affected from the virus
  • The worm is programmed to switch to its attack
    phase on July 20, thus we have a sudden increase
    in deactivation rate at midnight

29
Host Classification
  • Reverse DNS lookups are used to characterize the
    hosts
  • It is clear that a surprisingly large number of
    hosts are dial-up and broadband users
  • Diurnal variations are observed, which suggests
    that a majority of the infected hosts are not
    production web servers

30
Investigating time of day effect
  • Find location of hosts using IxMapping
    (http//www.ipmapper.com) service
  • Convert UTC time to local time for each host and
    plot active hosts as function of time

31
The Effect of DHCP
  • Between August 2 and August 16, 2 million
    infected addresses are observed
  • However only 143,000 hosts were active in the
    most active 10 minute period
  • This can be accounted to DHCP
  • DHCP inflates the infected host number
  • However NAT usage may deflate the number

32
  • Flash Crowds and Denial of Service Attacks
    Characterization and Implications for CDNs and
    Web Sites
  • J. Jung
  • B. Krishnamurthy
  • M. Rabinovich
  • In Proceedings of the International World Wide
    Web Conference (WWW 2002)

33
Definitions Problem Statement
  • Definitions
  • Flash Event (FE) A FE is a large surge in
    traffic to a particular Web site causing dramatic
    increase in server load and putting severe strain
    on the network links.
  • Denial of Service Attack (DoS) A DoS is an
    explicit attempt by attackers to prevent
    legitimate users of a service from using that
    service.
  • Problem
  • How to differentiate DoS attacks from Flash
    Events ?
  • How to improve CDN performance for handling FEs ?

34
Some Example DoS Attacks
  • TCP SYN Attack spoofed SYN packets
  • UDP Attacks connect chargen-echo
  • Ping of Death oversized ICMP packets cause crash
  • Smurf Attack ping various hosts with victims
    address
  • Fragile and Snork Attacks echo and WinNT RPC
  • Flooding Attack flood network with useless
    packets
  • DDoS Attacks !!!

35
Example Flash Events
  • Popular Events, like
  • Elections
  • Olympics
  • Catastrophic events, like
  • Sept. 11
  • Popular Webcasts
  • Play-along Web Sites (for TV shows)

36
Dimensions of the Comparison
  • The comparison between DoS and FE is done along
    the following dimensions
  • Traffic Patterns
  • Client Characteristics
  • File Reference Characteristics

37
Flash Events
  • Datasets Studied
  • Play-alongPlay-along web site for a populat TV
    show
  • ChileThe Chile Web site that hosted continuously
    updated election results of 1999 election

38
Traffic Volume
  • Request rate grows dramatically during the FE
  • But the duration of the FE is relatively short

39
Traffic Volume
  • Request rates increase rapidly during the
    initial period of the attack
  • But the increase is far from instantaneous,
    enough room for adaptation

40
Characterizing Clients
  • Number of clients in a FE is commensurate with
    the request rate

41
Characterizing Clients
  • There is no clear increase in per-client request
    rates

42
Old and New clusters
  • Old clusters clusters that have been seen before
    the FE
  • New clusters clusters that have been seen during
    the FE but not before
  • The percentage of old clusters during the FE is
    42.7 for Play-along and 82.9 for Chile
  • Significant proportion of the clusters seen
    during the FE consists of old clusters
  • Request distribution over clusters is highly
    skewed

43
File Reference Characteristics
  • Over 60 of documents are accessed only during
    flash events
  • Less than 10 of documents account for more than
    90 of the requests
  • File reference distribution is highly Zipf-like

44
DoS Attacks
  • Datasets studied
  • esg and olLog files that recorded more than 1
    million requests within 60 days. A password
    cracking attack is performed during this period.
  • bit.nl, creighton, fullnote, rellim,
    sptcccxusCollection of 5 traces that recorded
    requests to Web servers from machines infected by
    Code-Red worm.

45
Traffic Volume Client Characteristics
(Code-Red)
  • The surge occurred because of new clusters
    joining the attack
  • For traces that contain both infected and
    non-infected client requests, less than 14.3 of
    the clusters during the attack were old clusters
    (even smaller for password cracking)

46
Client Characteristics (Code-Red)
  • Request rates per client do not change during
    the attack
  • Distribution of requests among clusters are more
    spread across a number of clusters

47
Comparison of FE and DoS
?
48
Implications to CDNs
  • How we can handle FEs more effectively using
    CDNs?
  • We have seen that most requests during a FE are
    to documents that are not accessed before the FE
  • This causes a lot of cache misses, which
    overloads the origin server
  • One solution is to use cooperative caches, but
    this introduces high delays
  • Authors propose an alternative approach which
    does not incur a high delay yet decrease load on
    the origin server

49
Illustration of the Problem
CDNServer
OriginServer
Client
CDNServer
CDNServer
CDN DNSServer
50
Adaptive CDN
Write a Comment
User Comments (0)
About PowerShow.com