Title: Network-based Intrusion Detection, Mitigation and Forensics System
1Network-based Intrusion Detection, Mitigation and
Forensics System
- Yan Chen
- Department of Electrical Engineering and Computer
Science - Northwestern University
- Lab for Internet Security Technology (LIST)
- http//list.cs.northwestern.edu
2The Spread of Sapphire/Slammer Worms
3Current Intrusion Detection Systems (IDS)
- Mostly host-based and not scalable to high-speed
networks - Slammer worm infected 75,000 machines in lt10 mins
- Host-based schemes inefficient and user dependent
- Have to install IDS on all user machines !
- Mostly simple signature-based
- Cannot recognize unknown anomalies/intrusions
- New viruses/worms, polymorphism
4Current Intrusion Detection Systems (II)
- Statistical detection
- Unscalable for flow-level detection
- IDS vulnerable to DoS attacks
- Overall traffic based inaccurate, high false
positives - Cannot differentiate malicious events with
unintentional anomalies - Anomalies can be caused by network element faults
- E.g., router misconfiguration, link failures, etc.
5Network-based Intrusion Detection, Mitigation,
and Forensics System
- Online traffic recording
- SIGCOMM IMC 2004, INFOCOM 2006, ToN to appear
- Reversible sketch for data streaming computation
- Record millions of flows (GB traffic) in a few
hundred KB - Small of memory access per packet
- Scalable to large key space size (232 or 264)
- Online sketch-based flow-level anomaly detection
- IEEE ICDCS 2006 IEEE CGA, Security
Visualization 06 - Adaptively learn the traffic pattern changes
- As a first step, detect TCP SYN flooding,
horizontal and vertical scans even when mixed - Online stealthy spreader (botnet scan) detection
- IWQoS 2007
6Network-based Intrusion Detection, Mitigation,
and Forensics System (II)
- Integrated approach for false positive reduction
- Polymorphic worm signature generation detection
- IEEE Symposium on Security and Privacy 2006
- IEEE ICNP 2007 to appear
- Accurate network diagnostics
- ACM SIGCOMM 2006 IEEE INFOCOM 2007
- Scalable distributed intrusion alert fusion w/
DHT - SIGCOMM Workshop on Large Scale Attack Defense
2006 - Large-scale botnet event forensics using honeynet
- work in progress
7System Architecture
Remote aggregated sketch records
Streaming packet data
Part II Per-flow monitoring detection
8System Deployment
- Attached to a router/switch as a black box
- Edge network detection particularly powerful
Monitor each port separately
Monitor aggregated traffic from all ports
Original configuration
9Detecting Stealthy Spreaders Using Online
Outdegree Histograms
- Yan Gao1, Yao zhao1, Robert Schweller1,
- Shobha Venkataraman2, Yan Chen1,
- Dawn Song2 and Ming-Yang Kao1
1. Northwestern University 2. Carnegie Mellon
University
10Outline
- Motivation
- Problem definition
- System design
- Evaluation
- Conclusion
11Motivation
- High-speed network monitoring
- Small amount of memory usage
- Small number of memory accesses per packet
- Superspreaders vs. Stealthy spreaders
- Superspreaders sources that connect a large
number of distinct destinations - e.g. a compromised host doing fast scanning for
worm propagation - Stealthy spreaders a number of sources that send
more than a certain number of connections
(unsuccessful) to distinct destinations - e.g. botnet scans or moderate worm propagation
12Existing Data Streaming Algorithms
- Online entropy estimation approaches
- Chakrabarti et al. STACS 06 and Guha et al.
ACM SODA 06 - Pros detect unexpected changes in the network
traffic - Cons lose some concrete distribution information
- Online histogram estimation algorithms
- Gibbons et al. VLDB 97 and Gilbert et al.
STOC 02 - Pros provide more information on the features of
network traffic - Cons cannot record the number of unique items
- Superspreader detection schemes
- Venkataraman et al. NDSS 05 and Zhao et al.
IMC 05 - Pros detect sources with an very large outdegree
- Cons memory usage unscalable to small/medium
outdegrees such as bot scans - Superspreader detection is a special case of
spreader detection
13Outline
- Motivation
- Problem definition
- System design
- Evaluation
- Conclusion
14Problem Definitions
- Two high-level problems
- Construct an approximation of the outdegree
histogram online - Directly detect the presence of stealthy
spreaders without constructing the complete
outdegree histogram
15Problem Definition
- Input stream of (Src, Dst) pairs S
- Output
z --- of which powers define the buckets of the
histogram (z2)
16Problem Definition
- Input stream of (SIP, DIP) pairs S
- Output
Wi --- the set of sources
Number of sources
A source s is in Wi if and only if the
number of unique destinations that s connects to
is in the range of zi, zi1)
20
21
22
23
24
25
26
27
Number of unique destinations
Histogram
17Problem Definition
- Input stream of (SIP, DIP) pairs S
- Output
mi Wi Creating an approximate histogram is to
estimate mi for each bucket
Number of sources
20
21
22
23
24
25
26
27
Number of unique destinations
Histogram
18Contribution
- Study the problem of detecting stealthy spreaders
online - With constant small memory
- With small memory accesses per packet
- Design the algorithm to detect stealthy spreaders
online by approximating the outdegree histogram - Data recording phase
- Sampling and coupon collection-based algorithms
- Spreader detection phase
- Linear regression to find bins where attacks
happen - Show that the change of approximated histogram
reveals the presence of anomalies
19Outline
- Motivation
- Problem definition
- System design
- Evaluation
- Conclusion
20Recording Phase Sampling Algorithm
- Fast update a smaller number of counters
- per packet
2-3 h(src) 2-2
(src, dst)
Packet
Sampling algorithm
21Recording PhaseCoupon Collecting Algorithm
- Accurate create a better approximation
- interim structure
uniform random hash function for hashing dst to
an integer in 1, 2i
2-3 h(src) 2-2
(src, dst)
Packet
Coupon collecting algorithm
22Spreader Detection Phase
- Outdegree histogram construction
- Interim data structure -gt final outdegree
histogram - Using linear programming method
- Build a convex hull
- Other constraints
- Find the lower and upper bounds for mi
- Solution
- Directly use the interim data structure
Pros Obtain a reasonably accurate histogram
for normal network traffic Cons Fail to
accurately estimate the outdegree histogram for
anomalous traffic
23System Design
- Change detection
- The change of the interim data structure of two
time intervals - Stealthy spreader detection
- ki gt ch (threshold)
- System architecture
24Spreader Detection Phase
One Peak
Number of scanners
Close to 0
Number of distinct destination
25Spreader Detection Phase
- Linear regression for coupon collecting algorithm
- Mean squared error as the fitting metric
Value of counting
Bucket Example of linear regression
26Outline
- Motivation
- Problem definition
- System design
- Evaluation
- Conclusion
27Evaluation Methodology
- Traffic traces
- OC-48 CAIDA data on Aug. 14th, 2002
- The average packet rate 191K/s
- The average flow rate 3.75K/s
- A real scanning event collected from one class B
honeynet on Jan 7th, 2007 - Port 23
- 2.5 hours
- 1,607 unique sources
- 1,700,236 scan sessions
- Synthetic scanning traces
28Simulation Results
False negative 0 The estimation error within
20 76.1
Attack intensity
False negative 17.8 The estimation error
within 20 33.9
Percentage of detection results
Estimate ratio
Estimate ratio The estimate ratio of scan
outdegree
29Simulation Results
80
Cumulative percentage ()
35
Estimate ratio CDF of estimate ratio for spreader
intensity estimation
30Simulation Results
Estimation 90 Ground truth 87
Number of scanners
Number of distinct
destination The histogram of outdegree of
scanners collected in the honeynet
31Simulation Results
Mix the 5-min data of a real scanning event with
5-min normal traffic of CAIDA data (distribution
over 30 such intervals)
80
Cumulative percentage ()
Estimate ratio CDF of estimate ratios of scan
outdegree estimation
32Online Performance
- Memory consumption
- Our method O(c log(m))
- Constant memory 241KB 24KB
- Superspreader
- When k is small, the memory usage is closer to
the size of the entire data stream N. - Memory access per packet
- Single memory access per packet for each distinct
counting structure - Speed up processing in parallel or in pipeline
- Speed
- 3.2GHz Pentium 4 computer
- Recording 200 seconds for each 5-min CAIDA data
interval - Detection less than 0.1 second
33Conclusion
- Propose the stealthy spreader detection problem
- Design an online outdegree histogram based
stealthy spreader detection algorithm - Propose two randomized algorithms for recording
phase - Propose the linear regression based approach for
stealthy spreader detection
34