High-Performance Network Anomaly/Intrusion Detection

About This Presentation

Title:

High-Performance Network Anomaly/Intrusion Detection

Description:

Yan Chen Department of Electrical Engineering and Computer Science Northwestern University Lab for Internet & Security Technology (LIST) http://list.cs.northwestern.edu – PowerPoint PPT presentation

Number of Views:140

Avg rating:3.0/5.0

Slides: 53

Provided by: YanC157

Learn more at: https://users.cs.northwestern.edu

Category:

more less

Transcript and Presenter's Notes

Title: High-Performance Network Anomaly/Intrusion Detection

1
High-Performance Network Anomaly/Intrusion
Detection Mitigation System (HPNAIDM)

Yan Chen
Department of Electrical Engineering and Computer
Science
Northwestern University
Lab for Internet Security Technology (LIST)
http//list.cs.northwestern.edu

2
The Spread of Sapphire/Slammer Worms
3
Current Intrusion Detection Systems (IDS)

Mostly host-based and not scalable to high-speed
networks
Slammer worm infected 75,000 machines in lt10 mins
Host-based schemes inefficient and user dependent
Have to install IDS on all user machines !
Mostly simple signature-based
Cannot recognize unknown anomalies/intrusions
New viruses/worms, polymorphism

4
Current Intrusion Detection Systems (II)

Statistical detection
Unscalable for flow-level detection
IDS vulnerable to DoS attacks
Overall traffic based inaccurate, high false
positives
Cannot differentiate malicious events with
unintentional anomalies
Anomalies can be caused by network element faults
E.g., router misconfiguration, link failures, etc.

5
High-Performance Network Anomaly/Intrusion
Detection and Mitigation System (HPNAIDM)

Online traffic recording
SIGCOMM IMC 2004, INFOCOM 2006, ToN to appear
Reversible sketch for data streaming computation
Record millions of flows (GB traffic) in a few
hundred KB
Small of memory access per packet
Scalable to large key space size (232 or 264)
Online sketch-based flow-level anomaly detection
IEEE ICDCS 2006 IEEE CGA, Security
Visualization 06
Adaptively learn the traffic pattern changes
As a first step, detect TCP SYN flooding,
horizontal and vertical scans even when mixed

6
HPNAIDM (II)

Integrated approach for false positive reduction
Polymorphic worm detection (Hamsa)
IEEE Symposium on Security and Privacy 2006
Accurate network diagnostics ACM SIGCOMM 2006
Scalable distributed intrusion alert fusion w/DHT
SIGCOMM Workshop on Large Scale Attack Defense
2006
HPNAIDM First flow-level intrusion detection
that can sustain 10s Gbps bandwidth even for
worst case traffic of 40-byte packet streams

7
HPNAIDM Architecture
Remote aggregated sketch records
Streaming packet data
Part II Per-flow monitoring detection
8
Deployment of HPNAIDM

Attached to a router/switch as a black box
Edge network detection particularly powerful

Monitor each port separately
Monitor aggregated traffic from all ports
Original configuration
9
Hamsa Fast Signature Generation for Zero-day
Polymorphic Wormswith Provable Attack Resilience

Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao
and Brian Chavez

Northwestern University
10
Desired Requirements for Polymorphic Worm
Signature Generation

Network-based signature generation
Worms spread in exponential speed, to detect them
in their early stage is very crucial However
At their early stage there are limited worm
samples.
The high speed network router may see more worm
samples But
Need to keep up with the network speed !
Only can use network level information

11
Desired Requirements for Polymorphic Worm
Signature Generation

Noise tolerant
Most network flow classifiers suffer false
positives.
Even host based approaches can be injected with
noise.
Attack resilience
Attackers always try to evade the detection
systems
Efficient signature matching for high-speed links

No existing work satisfies these requirements !
12
Outline

Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion

13
Choice of Signatures

Two classes of signatures
Content based
Token a substring with reasonable coverage to
the suspicious traffic
Signatures conjunction of tokens
Behavior based
Our choice content based
Fast signature matching. ASIC based approach can
archive 6 8Gb/s
Generic, independent of any protocol or server

14
Unique Invariants of Worms

Protocol Frame
The code path to the vulnerability part, usually
infrequently used
Code-Red II .ida? or .idq?
Control Data leading to control flow hijacking
Hard coded value to overwrite a jump target or a
function call
Worm Executable Payload
CLET polymorphic engine 0\x8b, \xff\xff\xff
and t\x07\xeb
Possible to have worms with no such invariants,
but very hard

15
Hamsa Architecture
16
Hamsa Design

Key idea model the uniqueness of worm invariants
Greedy algorithm for finding token conjunction
signatures
Highly accurate while much faster
Both analytically and experimentally
Compared with the latest work, polygraph
Suffix array based token extraction
Provable attack resilience guarantee
Noise tolerant

17
Hamsa Signature Generator

Core part Model-based Greedy Signature
Generation
Iterative approach for multiple worms

18
Outline

Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion

19
Problem Formulation
Signature Generator
Signature
false positive bound r
With noise
NP-Hard!
20
Model Uniqueness of Invariants
U(1)upper bound of FP(t1)
U(2)upper bound of FP(t1,t2)
The total number of tokens bounded by k
21
Signature Generation Algorithm
token extraction
t1
u(1)15
tokens
Suspicious pool
Order by coverage
22
Signature Generation Algorithm
Signature
t1
t2
u(2)7.5
Order by joint coverage with t1
23
Algorithm Analysis

Runtime analysis O(T(MN))
Provable Attack Resilience Guarantee
Analytically bound the worst attackers can do!
Example K5, u(1)0.2, u(2)0.08, u(3)0.04,
u(4)0.02, u(5)0.01 and r0.01
The better the flow classifier, the lower are the
false negatives

Noise ratio FP upper bound FN upper bound
5 1 1.84
10 1 3.89
20 1 8.75
24
Attack Resilience Assumptions

Two common assumptions for any sig generation sys
Two unique assumptions for token-based schemes
Attacks to the flow classifier
Our approach does not depend on perfect flow
classifiers
With 99 noise, no approach can work!
High noise injection makes the worm propagate
less efficiently.
Enhance flow classifiers

25
Improvements to the Basic Approach

Generalizing Signature Generation
use scoring function to evaluate the goodness of
signature
Iteratively use single worm detector to detect
multiple worms
At the first iteration, the algorithm find the
signature for the most popular worms in the
suspicious pool.
All other worms and normal traffic treat as
noise.

26
Outline

Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion

27
Experiment Methodology

Experiential setup
Suspicious pool
Three pseudo polymorphic worms based on real
exploits (Code-Red II, Apache-Knacker and
ATPhttpd),
Two polymorphic engines from Internet (CLET and
TAPiON).
Normal pool 2 hour departmental http trace
(326MB)
Signature evaluation
False negative 5000 generated worm samples per
worm
False positive
4-day departmental http trace (12.6 GB)
3.7GB web crawling including .mp3, .rm, .ppt,
.pdf, .swf etc.
/usr/bin of Linux Fedora Core 4

28
Results on Signature Quality
Worms TrainingFN TrainingFP EvaluationFN EvaluationFN EvaluationFP Binaryevaluation FP
Worms Signature Signature Signature Signature Signature Signature
Code-Red II 0 0 0 0 0 0
Code-Red II '.ida?' 1, 'u780' 1, ' HTTP/1.0\r\n' 1, 'GET /' 1, 'u' 2 '.ida?' 1, 'u780' 1, ' HTTP/1.0\r\n' 1, 'GET /' 1, 'u' 2 '.ida?' 1, 'u780' 1, ' HTTP/1.0\r\n' 1, 'GET /' 1, 'u' 2 '.ida?' 1, 'u780' 1, ' HTTP/1.0\r\n' 1, 'GET /' 1, 'u' 2 '.ida?' 1, 'u780' 1, ' HTTP/1.0\r\n' 1, 'GET /' 1, 'u' 2 '.ida?' 1, 'u780' 1, ' HTTP/1.0\r\n' 1, 'GET /' 1, 'u' 2
CLET 0 0.109 0 0.06236 0.06236 0.268
CLET '0\x8b' 1, '\xff\xff\xff' 1,'t\x07\xeb' 1 '0\x8b' 1, '\xff\xff\xff' 1,'t\x07\xeb' 1 '0\x8b' 1, '\xff\xff\xff' 1,'t\x07\xeb' 1 '0\x8b' 1, '\xff\xff\xff' 1,'t\x07\xeb' 1 '0\x8b' 1, '\xff\xff\xff' 1,'t\x07\xeb' 1 '0\x8b' 1, '\xff\xff\xff' 1,'t\x07\xeb' 1

Single worm with noise
Suspicious pool size 100 and 200 samples
Noise ratio 0, 10, 30, 50, 70
Noise samples randomly picked from the normal
pool
Always get above signatures and accuracy.
Multiple worms with noises give similar results

29
Speed Results

Implementation with C/Python
500 samples with 20 noise, 100MB normal traffic
pool, 15 seconds on an XEON 2.8Ghz, 112MB memory
consumption
Speed comparison with Polygraph
Asymptotic runtime O(T) vs. O(M2), when M
increase, T wont increase as fast as M!
Experimental 64 to 361 times faster (polygraph
vs. ours, both in python)

30
Outline

Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion

31
Related works
Hamsa Polygraph CFG PADS Nemean COVERS Malware Detection
Network or host based Network Network Network Host Host Host Host
Content or behavior based Contentbased Contentbased Behaviorbased Contentbased Contentbased Behavior based Behaviorbased
Noise tolerance Yes Yes (slow) Yes No No Yes Yes
Multi worms in one protocol Yes Yes (slow) Yes No Yes Yes Yes
On-line sig matching Fast Fast Slow Fast Fast Fast Slow
Generality Generalpurpose Generalpurpose Generalpurpose Generalpurpose Protocolspecific Serverspecific Generalpurpose
Provable atk resilience Yes No No No No No No
Information exploited egp egp p egp e eg p
32
Conclusion

Network based signature generation and matching
are important and challenging
Hamsa automated signature generation
Fast
Noise tolerant
Provable attack resilience
Capable of detecting multiple worms in a single
application protocol
Proposed a model to describe the worm invariants

33
Questions ?

Thank You !

34
Experiment Sample requirement

Coincidental-pattern attack Polygraph
Results
For the three pseudo worms, 10 samples can get
good results.
CLET and TAPiON at least need 50 samples
Conclusion
For better signatures, to be conservative, at
least need 100 samplesRequire scalable and fast
signature generation!

35
Experiment U-bound evaluation

To be conservative we chose k15.
Even we assume every token has 70 false
positive, their conjunction still only have 0.5
false positive. In practice, very few tokens
exceed 70 false positive.
Define u(1) and ur, generate
We testedu(1) 0.02, 0.04, 0.06, 0.08, 0.10,
0.20, 0.30, 0.40, 0.5 and ur 0.20, 0.40,
0.60, 0.8. The minimum (u(1), ur) works for all
our worms was (0.08,0.20)
In practice, we use conservative value (0.15,0.5)

36
Results on Signature Quality (II)

Suspicious pool with high noise ratio
For noise ratio 50 and 70, sometimes we can
produce two signatures, one is the true worm
signature, anther solely from noise.
The false positive of these noise signatures have
to be very small
Mean 0.09
Maximum 0.7
Multiple worms with noises give similar results

37
Attack Resilience Assumptions

Common assumptions for any sig generation sys
The attacker cannot control which worm samples
are encountered by Hamsa
The attacker cannot control which worm samples
encountered will be classified as worm samples by
the flow classifier
Unique assumptions for token-based schemes
The attacker cannot change the frequency of
tokens in normal traffic
The attacker cannot control which normal samples
encountered are classified as worm samples by the
worm flow classifier

38
Normal Traffic Poisoning Attack

We found our approach is not sensitive to the
normal traffic pool used
History last 6 months time window
The attacker has to poison the normal traffic 6
month ahead!
6 month the vulnerability may have been patched!
Poisoning the popular protocol is very difficult.

39
Red Herring Attack

Hard to implement
Dynamic updating problem. Again our approach is
fast
Partial Signature matching, in extended version.

40
Coincidental Attack

As mentioned in the Polygraph paper, increase the
sample requirement
Again, our approach are scalable and fast

41
Model Uniqueness of Invariants

Let worm has a set of invariantsDetermine their
order by
t1 the token with minimum false positive in
normal traffic. u(1) is the upper bound of the
false positive of t1
t2 the token with minimum joint false positive
with t1 FP(t1,t2) bounded by u(2)
ti the token with minimum joint false positive
with t1, t2, ti-1. FP(t1,t2,,ti) bounded by
u(i)
The total number of tokens bounded by k

42
Problem Formulation

Noisy Token Multiset Signature Generation Problem
INPUT Suspicious pool M and normal traffic
pool N value rlt1.OUTPUT A multi-set of tokens
signature S(t1, n1), . . . (tk, nk) such that
the signature can maximize the coverage in the
suspicious pool and the false positive in normal
pool should less than r
Without noise, exist polynomial time algo
With noise, NP-Hard

43
Token-fit Attack Can Fail Polygraph

Polygraph hierarchical clustering to find
signatures w/ smallest false positives
With the token distribution of the noise in the
suspicious pool, the attacker can make the worm
samples more like noise traffic
Different worm samples encode different noise
tokens
Our approach can still work!

44
Token-fit attack could make Polygraph fail
CANNOT merge further!NO true signature
found!
45
Generalizing Signature Generation with noise

BEST Signature Balanced Signature
Balance the sensitivity with the specificity
But how? Create notation Scoring
functionscore(cov, fp, ) to evaluate the
goodness of signature
Current used
Intuition it is better to reduce the coverage
1/a if the false positive becomes 10 times
smaller.
Add some weight to the length of signature (LEN)
to break ties between the signatures with same
coverage and false positive

46
Generalizing Signature Generation with noise

Algorithm similar
Running time same as previous simple form
Attack Resilience Guarantee similar

47
Extension to multiple worm

Iteratively use single worm detector to detect
multiple worm
At the first iteration, the algorithm find the
signature for the most popular worms in the
suspicious pool. All other worms and normal
traffic treat as noise.
Though the analysis for the single worm can apply
to multiple worms, but the bound are not very
promising. Reason high noise ratio

48
Implementation details

Token Extraction extract a set of tokens with
minimum length l and minimum coverage COVmin.
Polygraph use suffix tree based approach 20n
space and time consuming.
Our approach Enhanced suffix array 8n space and
much faster! (at least 20 times)
Calculate false positive when check U-bounds
Again suffix array based approach, but for a
300MB normal pool, 1.2GB suffix array still
large!
Optimization using MMAP, memory usage 150
250MB

49
Token Extraction

Extract a set of tokens with minimum length lmin
and coverage COVmin. And for each token output
the frequency vector.
Polygraph use suffix tree based approach 20n
space and time consuming.
Our approach
Enhanced suffix array 4n space
Much faster, at least 50(UPDATE) times!
Can apply to Polygraph also.

50
Calculate the false positive

We need to have the false positive to check the
U-bounds
Again suffix array based approach, but for a
300MB normal pool, 1.2GB suffix array still
large!
Improvements
Caching
MMAP suffix array. True memory usage 150
250MB.
2 level normal pool
Hardware based fast string matching
Compress normal pool and string matching
algorithms directly over compressed strings

51
Future works

Enhance the flow classifiers
Cluster suspicious flows by return messages
Malicious flow verification by replaying to
Address Space Randomization enabled servers.

52
Experiment Attacks

We propose a new attack token-fit.
The attacker may study the noise inside the
suspicious pool
Create worm sample Wi which may has more same
tokens with some normal traffic noise sample Ni
This will stuck the hierarchical clustering used
in Polygraph
BUT We still can generate correct signature!

Write a Comment

User Comments (0)