Intrusion Detection: A Bioinformatics Approach - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Intrusion Detection: A Bioinformatics Approach

Description:

The algorithm uses semi-global alignment and a unique scoring system to measure ... used a modification of the Smith-Waterman local alignment algorithm to compute a ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 29
Provided by: Bha75
Category:

less

Transcript and Presenter's Notes

Title: Intrusion Detection: A Bioinformatics Approach


1
Intrusion Detection A Bioinformatics Approach
2
Introduction
  • This paper proposes a scheme to detect
    masquerading attacks, by applying techniques used
    in bioinformatics.
  • The algorithm uses semi-global alignment and a
    unique scoring system to measure similarity
    between a sequence of commands produced by a
    potential intruder and a sequence of commands
    collected from a legitimate user to detect
    intrusion.
  • This algorithm was then tested on standard
    intrusion data collection set. The results of the
    test showed that the described algorithm yields a
    promising combination of intrusion detection rate
    and false positive rate, when compared to the
    other published intrusion detection algorithms.

3
Contents
  • Intrusion Detection Systems
  • Masquerading
  • Sequence Alignment
  • Algorithm
  • Experimental Testing
  • Threshold Value Determination
  • Command Mismatch Scoring
  • Comparison
  • Future Work
  • References

4
Intrusion Detection Systems
  • Due to the evolving sophistication of intrusion
    methods, standard security deployments such as
    firewalls, patched operating systems and password
    protection are limited in their effectiveness.
  • An intrusion detection system (IDS) addresses the
    layer of security following the failure of the
    prior devices.
  • This layer usually monitors any number of data
    sources (i.e., audit logs, keystrokes, network
    traffic) for signs of inappropriate or anomalous
    behavior.

5
Masquerading
  • In the field of computer security, one of the
    most damaging attacks is masquerading in which an
    attacker assumes the identity of a legitimate
    user in a computer system.
  • Difficult to detect at initiation because the
    attacker appears to be a normal user with valid
    authority and privileges.
  • To detect a masquerade attack an analysis of a
    users command sequences, is a logical step.
  • This scheme proposes to use a semi-global
    alignment and a unique scoring system to detect
    anomalous user behavior which might be indicative
    of an intruder.

6
Sequence Alignment
  • A well-studied tool which is used to visualize
    and quantify analogies between sequences.
  • It is significantly used to compare genetic
    materials such as DNA, RNA and protein sequences.
  • Its various applications include searching
    sequence databases for specific genes or
    patterns, and discovering phylogenetic
    relationships through the use of multiple
    alignments.

7
Sequence Alignment
The scoring function assigns positive scores to
aligned characters that either match or are known
to be similar. Negative scores are assigned to
both aligned characters that are dissimilar and
characters that are aligned with gaps.
Typically, the score of an alignment is the sum
of the scores of each aligned pair of symbols.
The task of optimal sequence alignment is to
find the highest scoring alignment for a given
scoring function and pair of strings.
8
Global and Local Alignment
  • Global alignment is suitable to comparison of two
    strings that are believed to possess overall
    similarity.
  • But two strings may not possess homogeneity over
    their entire length, they may contain smaller
    substrings that are highly similar.
  • To identify more subtle types of similarity,
    Local Alignment was designed.
  • In a local alignment, only the characters in the
    two aligned substrings contribute to the score of
    the optimal alignment.
  • Thus, for each string, a suffix and a prefix are
    ignored by the scoring system.

9
Sequence Alignment- Types
A Global alignment algorithm aligns two strings
over their entire length
A local alignment algorithm aligns a substring of
each input string.
10
Semi-global Alignment
  • The problem with using a purely global alignment
    is that there may be large portions of the
    signature that do not necessarily align with a
    segment of the users commands though a
    subsection of commands might. This subtle
    similarity might go undetected leading to a false
    alarm.
  • The problem with using a purely local alignment
    is that if a large prefix and large suffix of the
    tested block of commands are ignored then the
    intrusion itself might be ignored.
  • So, the author used a modification of the
    Smith-Waterman local alignment algorithm to
    compute a semi global alignment.
  • In a semi-global alignment, you can choose to
    align only either prefixes or suffixes of the
    original input strings.

11
Algorithm
  • The signature sequence, which represents the
    users typical command behavior, will be referred
    to as the UserSig.
  • The monitored command sequence, which may contain
    a possible subsequence of masquerader commands,
    will be referred to as the IntrBlck (tested
    block).
  • The algorithm starts by initializing a matrix of
    floats, which is used to store the score
    throughout the alignment process.
  • Each position (i, j) in the matrix corresponds to
    the optimal score of an alignment ending at
    UserSigi and IntrBlckj.

12
Algorithm
  • Input string UserSig of length m, string
    IntrBlck of length n
  • 1. Initialize a matrix, D, of type integer
  • 2. for i0 to m
  • 3. for j0 to n
  • 4. if(j0 or i0)
  • 5. Dij0
  • 6. else
  • 7. if(jn or im)
  • 8. topDij-1
  • 9. leftDi-1j
  • 10. else
  • 11. topDij-1 gUserSig
  • 12. leftDi-1j gIntrBlck
  • 13. if(toplt0) topDij-1
  • 14. if(leftlt0) leftDi-1j
  • 15. diagonalDi-1j-1 matchScore(UserSigi-
    1,IntrBlckj-1)
  • 16. Dijmaximum(top,left,diagonal)
  • 17. return Dmn

13
Algorithm
  • This optimal score is computed by starting at the
    upper left corner of the matrix (i.e., at the
    point (0,0)) and then recursively making a step
    yielding the maximum from the three following
    options
  • Option 1 (diagonal step) The score ending at
    position (i-1,j-1) plus matchScore(UserSigi,
    IntrBlckj), which is a penalty or reward for
    aligning the UserSigs ith command with the
    IntrBlcks jth command.
  • Option 2 (top-down step) The score ending at
    position (i, j-1) plus gUserSig, which is the
    penalty for introducing a gap into the UserSig.
  • Option 3 (left-right step) The score ending at
    position (i-1,j) plus gIntrBlck, which is the
    penalty for introducing a gap into the IntrBlck.

14
Algorithm
  • If Option 1 yields the largest value, then the
    optimal alignment matches UserSigi with
    IntrBlckj.
  • If Option 2 or Option 3 yields the largest score,
    then the optimal alignment associates either
    UserSigi or IntrBlckj with a gap.
  • If Option 1 or Option 2 results in a negative
    value, then the alignment score is reset to zero
    to allow a prefix of both the UserSig and
    IntrBlck to have an arbitrary number of
    un-penalized gaps.

15
Algorithm - Scoring Scheme
  • The effectiveness of the alignment depends on the
    values assigned to following three parameters
  • The matchScore(UserSigi, IntrBlckj) function
    returns a negative value if the two commands do
    not match well and a positive value if they do.
  • The gUserSig and gIntrBlck are negative gap
    penalties associated with inserting gaps into the
    UserSig and IntrBlck, respectively.
  • The following conditions are considered while
    giving values to the parameters
  • A portion of the UserSig can be ignored without
    penalty because UserSig is significantly longer
    than the IntrBlck, and most of the commands in
    the UserSig will not participate in the
    alignment.
  • Each gap inserted into the UserSig corresponds to
    an IntrBlck command that is ignored and for
    proper detection IntrBlck commands cannot be
    ignored.

16
Algorithm - Scoring Scheme
  • The following are the parameter values that the
    authors used
  • Mismatches are kept at a constant score of 0, as
    a blanket reward or penalty for any mismatch
    would unfairly favor certain alignments, and
    would not disallow concept drift.
  • A Score of 1 for a match between two aligned
    commands i.e. matchScore(UserSigi ,IntrBlckj).
  • A Score of -2 for a gap placed in the tested
    block i.e. gIntrBlck.
  • A Score of -3 for a gap placed in the users
    signature i.e. gUserSig.
  • To facilitate proper detection, a threshold score
    must be determined to define at which point a
    score is indicative of an attack.

17
Experimental Testing Test Data
  • This algorithm was then tested using data
    provided by Schonlau et al3 to compare against
    the other intrusion detection schemes.
  • The SEA data provided 50 blocks of 100 commands
    each (5000 total commands) for each user, which
    can be assumed to be intrusion-free and were used
    as training data for the system.
  • They were also provided with 100 blocks of 100
    commands each(10000 total commands) for each
    user, in which were to be tested to determine if
    a masquerade attack has occurred.

18
Experimental Testing Test Metrics
  • False positive rate, false negative rate, and hit
    rate metrics were used to determine how well this
    alignment algorithm performed.
  • A false positive is a non-intrusion block that
    the algorithm has wrongly labeled as containing
    an intrusion.
  • A false negative is an intrusion block that the
    algorithm has wrongly labeled as non-intrusion.
  • A hit is an intrusion block that the algorithm
    has successfully labeled as containing an
    intrusion.
  • Effects of changing the various parameters of the
    alignment algorithm on the false positive and
    false negative rates were studied.

19
Experimental TestingMetrics Calculations
  • f number of false positives
  • n number of non-intrusion command sequence
    blocks
  • u number of users (50 in this case)
  • false positiveoverall(Siusers (fi/ni)/u)100
  • fn number of false negatives
  • n number of intrusion command sequence blocks
  • c number of users who have at least one
    intrusion block
  • false negativeoverall(Siusers (fni /ni)/c)100
  • hit rateoverall 100 false negativeoverall

20
Threshold Value Determination
  • The initial threshold score for each user is
    determined by cross-validating the users
    signature against itself.
  • 20, 100 command sections of the users signature
    are randomly chosen and aligned to a randomly
    chosen 1000 command section of the same users
    signature to create an initial average score that
    is similar to the score that the users testing
    data should produce.
  • This is then updated as new testing blocks are
    checked by averaging the current testing blocks
    score, and all tested block scores previous to
    it, with the initial average produced by the
    training data.

21
Threshold Value Determination
  • Then a percentage of that average is taken as the
    threshold score.
  • The threshold percentage can be chosen to achieve
    appropriate amount of sensitivity in the
    detection process.
  • This allows to customize the threshold for each
    user so that if a particular user did not have
    consistently high scoring alignments with their
    user signature, this users testing blocks will
    not be unduly flagged as intrusions.

22
Command Mismatch Scoring
  • Mismatches can be used to better determine how
    well the tested block aligns to the users
    signature, and therefore better tailor the
    algorithm to the problem of masquerade detection.
  • Therefore, a customized mismatch scoring system
    was used to allow for the possibility that the
    legitimate user may have interchanged one command
    with another in a particular alignment.
  • M number of times a particular command
    occurred in the tested block /number of times a
    particular command is expected to occur -1

23
Command Mismatch Scoring
  • M lies between -1 and 1.
  • If it is negative the mismatch is penalized and
    if M is positive it is rewarded.
  • If M is zero, the mismatch is neither penalized
    or rewarded.
  • After implementing this mismatch scoring scheme,
    the results drastically improved over the
    previous semi-global algorithm where mismatches
    were neither rewarded, nor penalized.

24
Comparison
25
Future Work
  • This particular system is also equally applicable
    to graphic user interface (GUI) interactions.
  • This method can be coupled with existing
    intrusion detection technologies in a hybrid
    system.
  • The parameters of the scoring algorithm can be
    tuned even further to allow for a more dynamic
    scoring system.
  • A multidimensional approach using several
    different alignment statistics could be a more
    powerful and robust mechanism for decreasing the
    false positive rate of our algorithm.

26
References
  • 1 Coull, S., Branch, J., Szymanski, B.,
    Breimer, Eric. Intrusion Detection A
    Bioinformatics Approach
  • 2 Wepsi, A., Dacier, M., and Debar, H. (1999)
    An Intrusion-Detection System Based on the
    Teiresias Pattern- Discovery Algorithm EICAR
    1999 Best Paper Proceedings.
  • 3 Schonlau, M., DuMouchel, W., Ju, W., Karr, A.
    F., Theus, M., and Vardi, Y. Computer intrusion
    detecting masquerades. Statistical Science,
    16(1) 58-74, February 2001.

27
Thank You
28
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com