Title: Intrusion Detection Modeling Technique and Experiment Design
1Intrusion Detection Modeling Technique and
Experiment Design
-
Yuhong Dong - ydong_at_cse.fau.edu
- March 19, 2004
-
-
2Table of Content
- Review IDS system (anomaly detection and misuse
detection) - IDS Modeling Algorithm
- -- Classification Modeling
- -- Association Rule
- -- Frequent Episode
- Feature Construction
- Experiments
- Conclusion
-
3Overview IDS system
- Anomaly Detection System
- IDES flag observed activities that deviate
significantly from the established normal usage
profiles - Misuse Detection System
- IDIOT and STAT use patterns of well-known
attacks or weak spots of the system to match and
identify known intrusion, patterns or signatures
4Building IDS is a hard work
- System builders rely on their intrusion and
experience to select the statistical measures for
anomaly detection - Experts first analyze and categorize attack
scenarios and system vulnerabilities, and
hand-code the corresponding rules and patterns
for misuse detection.
5Algorithm
- Classification
- maps a data item into one of several
predefined categories ( normal and abnormal) - --decision trees or rules
- Link analysis
- determines relations between fields in the
database records. Correlations of system features
in audit data. - -- A programmer, for example, may have
emacs highly associated with C file - Sequence analysis
- models sequential patterns. These algorithms
can discover what time-based sequence of audit
events are frequently occurring together. - -- patterns from audit data containing
network-based denial-of-service(DOS) attacks
suggest that several per-host and per-service
measures should be included.
6Classification Modeling normal / intrusion--
example of telnet records
Hot count of access of system
directory Compromised count of file/path not
found errors and Jump to instructions
7Classification Modeling -- Example Ripper Rules
from Telnet Records
Ripper selects the unique feature values in
identifying the intrusions. These rules can be
first inspected and edited by security experts,
and then be incorporated into misuse detection
system. The accuracy of classification model
depends directly on the set of features provided
in the training data. For example, if the
features hot, compromised and root_shell were
removed from the records in the Table1, Ripper
would not be able to produce accurate rules to
identify buffer overflow connections
8Association Rules
The goal of mining association rules is to derive
multifeature correlations from the database
table. Support(x) is defined as the percentage of
records that contain item set X. An association
rule is a set of item set X. An association rule
is an expression X-gtYc,s, ssupport(XUY) is the
support of the rule, and csupport(XUY)/support(X)
is the confidence.
9Frequent Episodes
- Given a set of time stamped event records, where
each record is a set of items, an intervalt1,t2
is the sequence of event records. - Support(x) is the ratio between the number of
minimum occurrences that contain X and the total
number of event records. - A frequent episode rule is the expression
- X,Y-gtZ c,s,w ssupport(XUYUZ) is the
support of the rule, and csupport(XUYUZ)/support(
XUY) is the confidence, wt2-t1
10Feature Construction
- Conditions
- --Network Intrusion Detection System
- --Algorithm frequent episodes
- --Pre-processing tcpdump data
- Experiment
- -- applying the frequent episodes program to
both normal connection data and intrusion data,
and compare the resulting patterns to find the
intrusion only patterns. - -- Then apply the algorithm to construct the
syn flood pattern, the result pattern a count
of connections to the same dst_host in the past 2
seconds, and among these connections, a
percentage of those that have the same service
and percentage of those that have the S0 flag. - Open problem
- -- how to decide the right time window
value w. - -- how to select the appropriate feaures to
detect an intrusion - -- how to select the right axis and
reference features to generate the most
distinguishing and useful intrusion patterns
11Experiments
- The Data Resources DARPA data
- -- Data Pre-processing
- Misuse Detection
- -- Manual and Automatic Feature Construction
- -- Detection Models
- -- Results
- User Anomaly Detection
- Conclusion and Future Directions
12Experiment
- Object of the Experiment
- -- survey and evaluate the state of the art
in research in intrusion detection. - Procedure
- -- Each participating site was required to
build intrusions detection models using the
training data, and send the results on the test
data back to DARPA for the performance
evaluation. - The DARPA data
- -- 4 gigabytes of compressed tcpdump data of
7 weeks of network traffic. - -- This data can be processed into about 5
million of connection records of about 100 bytes
each. - -- the data contains content of every packet
transmitted between hosts inside and outside a
simulated military base.
13Experiment
- DARPA DATA ( continued)
- Four main categories of attacks were
simulated - -- DOS, denial-of-service, for example, syn
flood - -- R2L, unauthorized access from a remote
machine, for example, guessing password - -- U2R, unauthorized access to local super
user privileges by a local unprivileged user,
buffer overflow attacks - -- Probing, surveillance and probing, for
example, port-scan, ping-sweep - Data Pre-processing each record includes these
intrinsic features - Misuse Detection Feature Construction and
Detection Models
14Experiment Feature Construction Detection
Model
- Detection Model
- -- traffic model DOS and Probing attack
- -- host-based traffic model slow Probing
attacks - -- content model R2L and U2R attack
- Result
-
X-axis false alarm rate Y-axis detection rate
X-axis is the false alarm rate, calculated as
the percentage of normal connections classified
as an intrusion.
15Experiment - Performance
- This is an misuse detection system, it is better
performance for the known attack than unknown
attack. For al intrusions, an overall detection
rate of bellow 70 is hardly satisfactory in a
mission critical environment.
16Experiment User Anomaly Detection
- Initial exploratory approach is to mine the
frequent patterns from user command data, and
merge or add the patterns into an aggregate set
to form the normal usage profile of a user. - A new pattern can be merged with an old pattern
if they have the same left-hand-sides and
right-hand-sides, their support values are within
a 5 of each other, and their confidence values
are also within 5 of each other - To analyze a user login session, we mine the
frequent patterns from the sequence of commands
during this session. This new pattern set is
compared with the profile pattern set and a
similarity score is assigned. Assume that the new
set has n patterns and among them, there are m
patterns that have matches in the profile
pattern set, then the similarity score is simply
m/n, a higher similarity score means a higher
likelihood that the users behavior agrees with
his or her historical profile.
17Conclusion and Future Directions
- Data generated from network traffic monitoring
tends to have very high volume, dimensionality
and heterogeneity, and there is a need for high
performance modeling algorithms that will scale
to very large network traffic data sets. - Network data is temporal (streaming) in nature,
and development of algorithms for mining data
streams is necessary for building real-time
intrusion detection system. - Low frequency of computer attacks requires
modification of standard data mining algorithms
for their detection. - Cyber attacks may be launched from several
different locations and targeted to many
different destinations, thus creating a need to
analyze network data from several network
locations in order to detect these distributed
attacks.
18Reference
- A Data Mining Framework for Building Intrusion
Detection Models