Fast Port Scan Detection Using Sequential Hypotheses Testing - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Fast Port Scan Detection Using Sequential Hypotheses Testing

Description:

2. Compute the likelihood ratio accumulated over a day ... 3. Raise a flag if this statistic exceeds some threshold. A sequential (on-line) solution ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 34
Provided by: csU73
Learn more at: http://www.cs.ucf.edu
Category:

less

Transcript and Presenter's Notes

Title: Fast Port Scan Detection Using Sequential Hypotheses Testing


1
Fast Port Scan Detection Using Sequential
Hypotheses Testing
  • Authors Jaeyeon Jung, Vern Paxson, Arthur W.
    Berger, and Hari Balakrishnan
  • IEEE Symposium on Security and Privacy 2004.

Presenter Tai Do CAP 6938 Jan. 18,2007
2
Introduction
  • Problem Random portscans of IP addresses is a
    popular method for attackers to find vulnerable
    machines in the reconnaissance phase.
  • Threshold Random Walk an online detection
    algorithm.
  • Motivation Early detection allows some form of
    protective response to mitigate or fully prevent
    damage.
  • Three quantities of interest for a detection
    problem
  • Detection accuracy
  • False alarm rate (false positive)
  • Misdetection rate (false negative)
  • Detection delay time

3
Challenges
  • No crisp definition of the activity
  • An attempted HTTP connection to the sites main
    Web server is OK.
  • A sweep through the entire address space looking
    for HTTP servers is NOT OK.
  • But how about connections to a few addresses,
    some of which succeed and some of which fail???
  • The granularity of identity
  • Probes from adjacent remote addresses as part of
    a single reconnaissance activity?
  • Probes from nearby addresses which together form
    a clear coverage pattern.
  • The locality of the addresses to which the probes
    are directed might be tight or scattered.
  • Temporal vs. spatial considerations
  • how much time do we track activity? Do we factor
    in the rate at which connections are made.
  • Intent
  • Not all scans are necessarily hostile (search
    engine crawlers, p2p applications).

4
Assumptions
  • Focus only on TCP scanners
  • Identity- Single remote IP addresses. No
    distributed scans. No vertical scans of a single
    host.
  • Does not assume a particular scanning rate from a
    remote host.

5
Outline
  • Existing Works
  • Data Analysis
  • Online Detection Algorithm Threshold Random Walk
  • Performance Evaluation
  • Concluding Remarks

6
Exiting Works
  • Counting Models Network security Monitor, Snort,
    and Bro.
  • Probabilistic Models LeckieK00, and SPICE.

7
Counting Models
  • Network security Monitor, Snort detect N events
    within a time interval of T seconds.
  • Bro treats connections differently depending on
    their services. Services in a configurable list
    (only count failed attempts) vs. others. Raise
    flags if the number of distinct destination
    addresses reaches a configurable parameter.
  • Disadvantages threshold selection.

8
Probabilistic Models
  • LeckieK02
  • An access probability distribution for each local
    IP address, computed across all remote source IP
    addresses that access that destination.
  • Also consider the number of distinct local IP
    addresses that a given remote source has accessed
    so far.
  • Scanners are modeled as accessing each
    destination address with equal probability.
  • Flaws
  • Many false positives
  • No confidence levels to assess whether the
    difference is large enough.
  • How to assign an a priori probability to
    destination addresses that have never been
    accessed.

9
Probabilistic Models
  • SPICE StanifordHM00
  • Detect stealthy scans (very low rates, and spread
    across multiple source addresses)
  • Assign anomaly scores to packets based on
    conditional probabilities derived from the source
    and destination addresses and ports.
  • Collect packets over long intervals (days or
    weeks) and then cluster them using simulated
    annealing to find correlations that are then
    reported as anomalous events.
  • Disadvantages
  • Significantly more run-time processing
  • More complex.
  • Off-line method

10
Outline
  • Existing Works
  • Data Analysis
  • Online Detection Algorithm Threshold Random Walk
  • Performance Evaluation
  • Concluding Remarks

11
Initial Data Sets
  • HTTP worms Code Red or Nimda.
  • Other_bad send packets to 135/tcp, 139/tcp,
    445/tcp, or 1433/tcp corresponding to Windows
    RPC, NetBios, SMB, and SQL-Snaket attacks.
  • Two Research Labs LBL, and ICSI
  • Bro NIDS is used.
  • 8 data sets (6 2).
  • 24-hour period.

known_bad scanner HTTP worms
other_bad
12
A Better Ground Truth
  • Ground Truth the available data sets is a good
    start, but not strong enough.
  • There may be undetected scanners among remainder
    entries.
  • How to determine likely, but undetected scanners?
  • Ideal situation using a method that is wholly
    separate from the subsequently developed
    detection algorithm. The paper fails to find such
    a method.
  • Use the same properties to 1) distinguish likely
    scanners from non-scanners in the remainder
    hosts, and 2) incorporate in the detection
    algorithm.
  • Soundness of the method show that the likely
    scanners do indeed have characteristics in common
    with known malicious hosts.

13
Key Observation
  • inactive_pct the percentage of the local hosts
    that a given remote host has accessed for which
    the connection attempt failed (rejected or
    unanswered).

14
Key Observation
  • inactive_pct the percentage of the local hosts
    that a given remote host has accessed for which
    the connection attempt failed (rejected or
    unanswered).

15
Separating Possible Scanners
  • inactive_pct the percentage of the local hosts
    that a given remote host has accessed for which
    the connection attempt failed.
  • inactive_pct lt 80 benign remote host.
  • inactive_pct gt 80 possible scanner (suspect).

16
Final Data Sets
  • Additional Supporting Evidence Suspect hosts
    exhibit distribution quite similar to those for
    known-bad hosts.

17
Outline
  • Existing Works
  • Data Analysis
  • Online Detection Algorithm Threshold Random Walk
  • Performance Evaluation
  • Concluding Remarks

18
Hypothesis testing formulation
  • A remote host R attempts to connect a local host
    at time i
  • let Yi 0 if the connection attempt is a
    success,
  • 1 if failed connection
  • As outcomes Y1, Y2, are observed we wish to
    determine whether R is a scanner or not
  • Two competing hypotheses
  • H0 R is benign
  • H1 R is a scanner

The distribution of the Bernoulli random variable
Yi
19
An off-line approach
  • Collect sequence of data Y for one day
  • (wait for a day)
  • 2. Compute the likelihood ratio accumulated over
    a day
  • This is related to the proportion of inactive
    local hosts that R tries to connect (resulting in
    failed connections)
  • 3. Raise a flag if this statistic exceeds some
    threshold

20
A sequential (on-line) solution
  • Update accumulative likelihood ratio statistic in
    an online fashion
  • 2. Raise a flag if this exceeds some threshold

Acc. Likelihood ratio
Threshold ?1
Threshold ?2
hour
0
24
21
(No Transcript)
22
Likelihood Ratio
  • The second equality follows from the i.i.d
    assumption of the random variables YiHj.

23
Threshold Selection
Performance Criteria
Detection Probability, PD the algorithm selects
H1 when H1 is in fact true.
False Positive Probability, PF the algorithm
selects H1 when H0 is in fact true.
Threshold Selection
or
similarly
Errors differences between actual bounds and
desired bounds
24
Detection Delay Time
  • The number of observations N until the test
    terminates.

Log likelihood Ratio
Walds equation
What is EN?
25
Outline
  • Existing Works
  • Data Analysis
  • Online Detection Algorithm Threshold Random Walk
  • Performance Evaluation
  • Concluding Remarks

26
Evaluation Methodology
  • Used the data from the two labs
  • Knowledge of whether each connection is
    established, rejected, or unanswered
  • Maintains 3 variables for each remote host
  • D_s, the set of distinct hosts previously
    connected to
  • S_s, the decision state (pending, H_0, or H_1)
  • L_s, the likelihood ratio

27
Evaluation Methodology (cont.)
  • For each line in dataset
  • Skip if not pending
  • Determine if connection is successful
  • Check whether is already in connection set if
    so, proceed to next line
  • Update D_s and L_s
  • If L_s goes beyond either threshold, update state
    accordingly

28
Comparison with other existing intrusion
detection systems (Bro Snort)
0.963 0.040 4.08
1.000 0.008 4.06
  • Efficiency 1 - false positives / true
    positives
  • Effectiveness false negatives/ all samples
  • N of samples used (i.e., detection delay time)

29
Comparison with other existing intrusion
detection systems (Bro Snort)(cont.)
  • TRW is far more effective than the other two
  • TRW is almost as efficient as Bro
  • TRW detects scanners in far less time

30
Outline
  • Existing Works
  • Data Analysis
  • Online Detection Algorithm Threshold Random Walk
  • Performance Evaluation
  • Concluding Remarks

31
Strengths of the paper
  • Good observation
  • inactive_pct provides a strong modality to
    differentiate benign hosts from suspicious hosts.
  • Sequential analysis is well-suited
  • Provide mathematical bounds on the expected
    performance of the algorithm (PD, PF, and N)
  • minimize the detection time given fixed false
    alarm and misdetection rates
  • balance the tradeoff between these three
    quantities (false alarm, misdetection rate,
    detection time) effectively

32
Limitations and Possible Improvements
  • Nearly circular argument between ground truth,
    and the developed detection algorithm. Both use
    the same key observation.
  • Oscillation problem in the detection algorithm.
  • Leveraging Additional Information
  • Managing State
  • How to Respond
  • Evasion and Gaming
  • Distributed Scans

33
References
  • LeckieK02 C. Leckie and R. Kotagiri. A
    probabilistic approach to detecting network
    scans. In Proceedings of the Eighth IEEE Network
    Operations and Management Symposium (NOMS 2002),
    pages 359372, Florence, Italy, Apr. 2002.
  • StanifordHM00 S. Staniford, J. A. Hoagland, and
    J. M. McAlerney. Practical automated detection of
    stealthy portscans. In Proceedings of the 7th ACM
    Conference on Computer and Communications
    Security, Athens, Greece, 2000.
  • XuanLong Nguyen. Sequential analysisbalancing
    the tradeoff between detection accuracy and
    detection delay. Presentation, Radlab, UCB,
    11/06/06.
Write a Comment
User Comments (0)
About PowerShow.com