Title: Evolving Fuzzy Classifiers for Intrusion Detection
1Evolving Fuzzy Classifiers for Intrusion
Detection
- Jonatan Gomez
- Dipankar Dasgupta
- Presented By Sohraab Soltani
2Intrusion Detection
- Misuse Detection
- Use signatures of known intrusions.
- Low false alarm rate.
- Unable to detect unknown attacks.
- Anomaly Detection
- Builds a profile based on system normal behavior.
- Label any behavior that deviates from a normal
distribution as anomaly. - Enable to detect unknown attacks.
- High false alarm rate.
3Overview
Training Data
Find Fuzzy Classifier Rules For normal and
abnormal behaviors
Label each data point as normal or abnormal
Data Flow
Trigger an alarm if it is abnormal
4Fuzzy Logic
- Classic An Object entirely in a set or not.
- Fuzzy An object can partially be in a set.
5Fuzzy Operators
6Fuzzy Rule
- Rule IF condition THEN consequence weight
- TV(R) TV(condition) weight
- Example IF x is HIGH and y is LOW THEN pattern
is normal 0.4
7A Fuzzy Classifier as an Intrusion Detector
8Class Prediction
9Steps to generate a fuzzy rule for class k using
GA
10Representation of the condition part of the fuzzy
rule.
x is C or z is E and w is not D
11Binary Tree representation
- Free parenthesis expression
- A or B and C and D or E.
- Represents the logical expression
- (((A or E) and C) or (B and D))
- Can also be represented by complete tree
-
12Genetic operators- Crossover
Because the crossover point was selected inside
nodes C and Y, then these nodes interchange their
code and create new fuzzy expressions H and M.
13Genetic operators- Gene addition, deletion
14Genetic operators- Mutation
15Fitness Function Confusion Matrix
16Fitness Function
17KDDCUP DATASET
- duration continuous.
- protocol_type symbolic.
- service symbolic.
- flag symbolic.
- src_bytes continuous.
- dst_bytes continuous.
- land symbolic.
- wrong_fragment continuous.
- Overally 41 features class attribute.
18Experimental Settings
- Normalize each continuous attribute.
- A five-fold cross validation.
- A genetic algorithm was initialized by 200 random
chromosome. - Length of each chromosome is between one to six.
- Maximum number of iteration is 200.
- GA runs 5 times, one for each class.
-
19Accuracy
20ROC Curve
21Conclusion
- Curse of Dimensionality
- As the dimension of the data increases, it
impacts the performance of the algorithm. - Multimodality
- It may possible that more than one normal pattern
exist in a data set. - False alarm rate