Title: Intrusion Detection: A Bioinformatics Approach
1Intrusion Detection A Bioinformatics Approach
2Introduction
- This paper proposes a scheme to detect
masquerading attacks, by applying techniques used
in bioinformatics. - The algorithm uses semi-global alignment and a
unique scoring system to measure similarity
between a sequence of commands produced by a
potential intruder and a sequence of commands
collected from a legitimate user to detect
intrusion. - This algorithm was then tested on standard
intrusion data collection set. The results of the
test showed that the described algorithm yields a
promising combination of intrusion detection rate
and false positive rate, when compared to the
other published intrusion detection algorithms.
3Contents
- Intrusion Detection Systems
- Masquerading
- Sequence Alignment
- Algorithm
- Experimental Testing
- Threshold Value Determination
- Command Mismatch Scoring
- Comparison
- Future Work
- References
4Intrusion Detection Systems
- Due to the evolving sophistication of intrusion
methods, standard security deployments such as
firewalls, patched operating systems and password
protection are limited in their effectiveness. - An intrusion detection system (IDS) addresses the
layer of security following the failure of the
prior devices. - This layer usually monitors any number of data
sources (i.e., audit logs, keystrokes, network
traffic) for signs of inappropriate or anomalous
behavior.
5Masquerading
- In the field of computer security, one of the
most damaging attacks is masquerading in which an
attacker assumes the identity of a legitimate
user in a computer system. - Difficult to detect at initiation because the
attacker appears to be a normal user with valid
authority and privileges. - To detect a masquerade attack an analysis of a
users command sequences, is a logical step. - This scheme proposes to use a semi-global
alignment and a unique scoring system to detect
anomalous user behavior which might be indicative
of an intruder.
6Sequence Alignment
- A well-studied tool which is used to visualize
and quantify analogies between sequences. - It is significantly used to compare genetic
materials such as DNA, RNA and protein sequences. - Its various applications include searching
sequence databases for specific genes or
patterns, and discovering phylogenetic
relationships through the use of multiple
alignments.
7Sequence Alignment
The scoring function assigns positive scores to
aligned characters that either match or are known
to be similar. Negative scores are assigned to
both aligned characters that are dissimilar and
characters that are aligned with gaps.
Typically, the score of an alignment is the sum
of the scores of each aligned pair of symbols.
The task of optimal sequence alignment is to
find the highest scoring alignment for a given
scoring function and pair of strings.
8Global and Local Alignment
- Global alignment is suitable to comparison of two
strings that are believed to possess overall
similarity. - But two strings may not possess homogeneity over
their entire length, they may contain smaller
substrings that are highly similar. - To identify more subtle types of similarity,
Local Alignment was designed. - In a local alignment, only the characters in the
two aligned substrings contribute to the score of
the optimal alignment. - Thus, for each string, a suffix and a prefix are
ignored by the scoring system.
9Sequence Alignment- Types
A Global alignment algorithm aligns two strings
over their entire length
A local alignment algorithm aligns a substring of
each input string.
10Semi-global Alignment
- The problem with using a purely global alignment
is that there may be large portions of the
signature that do not necessarily align with a
segment of the users commands though a
subsection of commands might. This subtle
similarity might go undetected leading to a false
alarm. - The problem with using a purely local alignment
is that if a large prefix and large suffix of the
tested block of commands are ignored then the
intrusion itself might be ignored. - So, the author used a modification of the
Smith-Waterman local alignment algorithm to
compute a semi global alignment. - In a semi-global alignment, you can choose to
align only either prefixes or suffixes of the
original input strings.
11Algorithm
- The signature sequence, which represents the
users typical command behavior, will be referred
to as the UserSig. - The monitored command sequence, which may contain
a possible subsequence of masquerader commands,
will be referred to as the IntrBlck (tested
block). - The algorithm starts by initializing a matrix of
floats, which is used to store the score
throughout the alignment process. - Each position (i, j) in the matrix corresponds to
the optimal score of an alignment ending at
UserSigi and IntrBlckj.
12Algorithm
- Input string UserSig of length m, string
IntrBlck of length n - 1. Initialize a matrix, D, of type integer
- 2. for i0 to m
- 3. for j0 to n
- 4. if(j0 or i0)
- 5. Dij0
- 6. else
- 7. if(jn or im)
- 8. topDij-1
- 9. leftDi-1j
- 10. else
- 11. topDij-1 gUserSig
- 12. leftDi-1j gIntrBlck
- 13. if(toplt0) topDij-1
- 14. if(leftlt0) leftDi-1j
- 15. diagonalDi-1j-1 matchScore(UserSigi-
1,IntrBlckj-1) - 16. Dijmaximum(top,left,diagonal)
- 17. return Dmn
13Algorithm
- This optimal score is computed by starting at the
upper left corner of the matrix (i.e., at the
point (0,0)) and then recursively making a step
yielding the maximum from the three following
options - Option 1 (diagonal step) The score ending at
position (i-1,j-1) plus matchScore(UserSigi,
IntrBlckj), which is a penalty or reward for
aligning the UserSigs ith command with the
IntrBlcks jth command. - Option 2 (top-down step) The score ending at
position (i, j-1) plus gUserSig, which is the
penalty for introducing a gap into the UserSig. - Option 3 (left-right step) The score ending at
position (i-1,j) plus gIntrBlck, which is the
penalty for introducing a gap into the IntrBlck.
14Algorithm
- If Option 1 yields the largest value, then the
optimal alignment matches UserSigi with
IntrBlckj. - If Option 2 or Option 3 yields the largest score,
then the optimal alignment associates either
UserSigi or IntrBlckj with a gap. - If Option 1 or Option 2 results in a negative
value, then the alignment score is reset to zero
to allow a prefix of both the UserSig and
IntrBlck to have an arbitrary number of
un-penalized gaps.
15Algorithm - Scoring Scheme
- The effectiveness of the alignment depends on the
values assigned to following three parameters - The matchScore(UserSigi, IntrBlckj) function
returns a negative value if the two commands do
not match well and a positive value if they do. - The gUserSig and gIntrBlck are negative gap
penalties associated with inserting gaps into the
UserSig and IntrBlck, respectively. - The following conditions are considered while
giving values to the parameters - A portion of the UserSig can be ignored without
penalty because UserSig is significantly longer
than the IntrBlck, and most of the commands in
the UserSig will not participate in the
alignment. - Each gap inserted into the UserSig corresponds to
an IntrBlck command that is ignored and for
proper detection IntrBlck commands cannot be
ignored.
16Algorithm - Scoring Scheme
- The following are the parameter values that the
authors used - Mismatches are kept at a constant score of 0, as
a blanket reward or penalty for any mismatch
would unfairly favor certain alignments, and
would not disallow concept drift. - A Score of 1 for a match between two aligned
commands i.e. matchScore(UserSigi ,IntrBlckj). - A Score of -2 for a gap placed in the tested
block i.e. gIntrBlck. - A Score of -3 for a gap placed in the users
signature i.e. gUserSig. - To facilitate proper detection, a threshold score
must be determined to define at which point a
score is indicative of an attack.
17Experimental Testing Test Data
- This algorithm was then tested using data
provided by Schonlau et al3 to compare against
the other intrusion detection schemes. - The SEA data provided 50 blocks of 100 commands
each (5000 total commands) for each user, which
can be assumed to be intrusion-free and were used
as training data for the system. - They were also provided with 100 blocks of 100
commands each(10000 total commands) for each
user, in which were to be tested to determine if
a masquerade attack has occurred.
18Experimental Testing Test Metrics
- False positive rate, false negative rate, and hit
rate metrics were used to determine how well this
alignment algorithm performed. - A false positive is a non-intrusion block that
the algorithm has wrongly labeled as containing
an intrusion. - A false negative is an intrusion block that the
algorithm has wrongly labeled as non-intrusion. - A hit is an intrusion block that the algorithm
has successfully labeled as containing an
intrusion. - Effects of changing the various parameters of the
alignment algorithm on the false positive and
false negative rates were studied.
19Experimental TestingMetrics Calculations
- f number of false positives
- n number of non-intrusion command sequence
blocks - u number of users (50 in this case)
- false positiveoverall(Siusers (fi/ni)/u)100
- fn number of false negatives
- n number of intrusion command sequence blocks
- c number of users who have at least one
intrusion block - false negativeoverall(Siusers (fni /ni)/c)100
- hit rateoverall 100 false negativeoverall
20Threshold Value Determination
- The initial threshold score for each user is
determined by cross-validating the users
signature against itself. - 20, 100 command sections of the users signature
are randomly chosen and aligned to a randomly
chosen 1000 command section of the same users
signature to create an initial average score that
is similar to the score that the users testing
data should produce. - This is then updated as new testing blocks are
checked by averaging the current testing blocks
score, and all tested block scores previous to
it, with the initial average produced by the
training data.
21Threshold Value Determination
- Then a percentage of that average is taken as the
threshold score. - The threshold percentage can be chosen to achieve
appropriate amount of sensitivity in the
detection process. - This allows to customize the threshold for each
user so that if a particular user did not have
consistently high scoring alignments with their
user signature, this users testing blocks will
not be unduly flagged as intrusions.
22Command Mismatch Scoring
- Mismatches can be used to better determine how
well the tested block aligns to the users
signature, and therefore better tailor the
algorithm to the problem of masquerade detection.
- Therefore, a customized mismatch scoring system
was used to allow for the possibility that the
legitimate user may have interchanged one command
with another in a particular alignment. - M number of times a particular command
occurred in the tested block /number of times a
particular command is expected to occur -1
23Command Mismatch Scoring
- M lies between -1 and 1.
- If it is negative the mismatch is penalized and
if M is positive it is rewarded. - If M is zero, the mismatch is neither penalized
or rewarded. - After implementing this mismatch scoring scheme,
the results drastically improved over the
previous semi-global algorithm where mismatches
were neither rewarded, nor penalized.
24Comparison
25Future Work
- This particular system is also equally applicable
to graphic user interface (GUI) interactions. - This method can be coupled with existing
intrusion detection technologies in a hybrid
system. - The parameters of the scoring algorithm can be
tuned even further to allow for a more dynamic
scoring system. - A multidimensional approach using several
different alignment statistics could be a more
powerful and robust mechanism for decreasing the
false positive rate of our algorithm.
26References
- 1 Coull, S., Branch, J., Szymanski, B.,
Breimer, Eric. Intrusion Detection A
Bioinformatics Approach - 2 Wepsi, A., Dacier, M., and Debar, H. (1999)
An Intrusion-Detection System Based on the
Teiresias Pattern- Discovery Algorithm EICAR
1999 Best Paper Proceedings. - 3 Schonlau, M., DuMouchel, W., Ju, W., Karr, A.
F., Theus, M., and Vardi, Y. Computer intrusion
detecting masquerades. Statistical Science,
16(1) 58-74, February 2001.
27Thank You
28(No Transcript)