Title: SSCI
1 SSCI 1301DARPA OASIS PI MEETING Santa Fe, NM
- Jul 24-27, 2001Intelligent Active Profiling
for Detection and Intent Inference of Insider
Threat in Information Systems
Joao B. D. Cabrera and Raman
K. Mehra Scientific
Systems Company, Inc.
Lundy
Lewis Wenke Lee
Aprisma
Inc. North Carolina State
Univ.
SBIR
Phase I
Topic No. SB002-039
Contract No. DAAH01-01-C-R027
2ObjectiveClassifying and Responding to Insider
Threats
- Objectives Design and evaluate IDSs capable of
classifying and responding to Insider Threats
investigate the use of Network Management Systems
as a vehicle. - Misuse/Intrusion Tolerance is achieved by
having an adequate and timely response. - Technology Statistical Pattern Recognition and
AI for the design of detectors and classifiers
NMSs for data collection and response
coordination. - Approach Utilize the Benchmark Problem for
proof-of-concept studies examine the
applicability of NMSs and peripherals for
monitoring and response.
3Towards Adequate and Timely Response
- Adequate
- High Accuracy Few False Alarms, Lots of
Detections. - Distinguish among attacks Different attacks
elicit different types of response. - Distinguish faults from attacks.
-
Timely - Detect the Attack before it is too late to
respond.
4Question 1 What threats/attacks are your project
considering ?
Insider Attacks Password stealing,
unauthorized database access, email snooping,
etc. For proof-of-concept purposes, we
investigated the Benchmark Problem of System
Calls made by Unixs sendmail. However,
the technologies and tools we are developing are
applicable to any situation in which the
observables are sequences of possibly correlated
categorical variables Audit Records by BSM in
Unix or Object Access Auditing in Windows NT.
5Question 2 What assumptions do your project make
?
1. Data sets corresponding to normal, malicious
and faulty behavior are available for the
construction and testing of detection schemes
Training Stage and Testing Stage. 2. The
observables for normal, malicious and faulty
behavior are sequences of categorical variables.
3. Patterns capable of differentiating between
different types of malicious activity and faults
exist, and are learnable by special purpose
algorithms verified in the effort. 4. If 3.
is possible, there is time to take preventive
action when malicious activity is detected.
6Question 3 What policies can your project
enforce ?
If the detection system accuses the
presence of malicious activity, a response will
be triggered. For the specific case of the
Benchmark Problem, typical responses would be to
kill the process, or delay its execution till
time out. Intent Inference gives the
capability of specializing the response. ?
The project aims to develop a capability
Intent Inference - which can be used as a
component of Intrusion Tolerant Architectures.
7Benchmark ProblemDetect malicious activity by
monitoring System Calls made by Privileged
Processes in Unix
Originally suggested by C. Ko, G. Fink, and K.
Levitt 1994. Extensively studied by the UNM
Group (S. Forrest and others), starting with A
Sense of Self for Unix Processes 1996.
Programs sendmail, lpr, ls, ftp, finger Well
Investigated Problem Our results could be
compared with previous efforts. We concentrated
on sendmail Data sets for six types of
anomalies (five attacks and one fault) are
available.
8Benchmark Problem (cont.)
UNM Finding A relatively small dictionary of
short sequences (901 sequences of length 6 for
sendmail) provides a very good characterization
of normality for several Unix processes. The
dictionary is constructed using a Training Set of
Normal behavior. Sequences not belonging to
this dictionary are called abnormal sequences.
Intrusions are detected if a process contains
too many abnormal sequences. Processes are
labeled as normal or intrusions All intrusions
receive the same label.
9Privileged Programs and the space of OS calls
10Anomaly Count Detector (UNM)
- Determining the
Threshold - Anomalous Traces not available Anomaly
Detection Problem. - Anomalous Traces available Classification
Problem.
11Anomaly Count Detector - Statistics
- Typical
Results - A2, A3, A4, A5 detectable (anomaly counts well
above normal). - A1 decode intrusion Not Detectable.
12This Project Specific Objectives and
Accomplishments
- 1. Intent Inference
- Demonstrated the feasibility of performing Intent
Inference based on sequences of OS calls for
sendmail. - The classification results were quantified and
compared with the detection results by UNM. - Fusion of Detection Systems
- Demonstrated the improvement of detection rates
gained by combining the proposed scheme for
Intent Inference with the UNM scheme for
detection based on Anomaly Counts.
13Intent Inference
We pose the problem of Intent Inference as
distinguishing between types of attacks and
faults using the sequences of OS calls.
From the statistical point of view, this is a
classification problem. The main issue is to
find features that cluster the different types of
attacks and faults.
14Looking for Features Returning to the space of
OS Calls
Balance between small within-class-scatter
(elements in each class as clustered as possible)
and large between-class-scatter (classes as
separated as possible). The Abnormal Sequences
corresponding to each Anomaly can also be viewed
as Features. Do they have any Discriminating
Power ?
15Discriminating Power of Anomalous
Sequences(Anomalies for which Multiple Traces
are available)
It was observed that the Anomalous Sequences
are distinct for each Anomaly Type (large
between-class-scatter), and appear consistently
in all traces of a given Anomaly (small
within-class-scatter).
? The Anomalous Sequences are good discriminators.
16Why this is so ?
- Anomalous Processes are the superposition of
large sections of Normal Actions reflecting the
Normal Behavior of the Program (typically 90)
and a small, concentrated sequence of very
specific actions associated with the Anomaly. - Different anomalies are related to different
actions, and it is reasonable to expect that
these distinctions would be apparent. - It is remarkable however that this separation
could be observed at the level of OS Calls. - The Anomalous Sequences serve as signatures for
the Anomalies These are statistical
signatures, extracted by an automatic procedure,
not by domain knowledge.
17Constructing a Classifier based on Anomalous
Sequences
- Extract the Normal Dictionary.
- For each Anomaly Type, record the corresponding
Anomalous Sequences Call the set of these
sequences as the Anomaly Dictionary for the
Anomaly. After Training, there will be N Anomaly
Dictionaries. - Incoming Processes are labeled according to
matches with the Anomaly Dictionaries the
Anomaly with most matches is selected. - Processes for which no match is found are labeled
as Normal.
18String Matching Classifier
The operation is as simple as the Anomaly
Count Detector, but the Memory Storage
Requirements are typically 70 less.
19Performance Evaluation(Testing Set average of
4,000 combinations)
- 100 performance for A1 and A2 for k gt 5. A1 is
detected, which is not possible using Anomaly
Counts. - No False Alarms for k lt 8.
20Performance Evaluation (cont.)
- Poor Performance for Unknown Anomalies
Mislabeled as one of the Known Anomalies. - 20 of the Fault Anomalies are missed.
21Improving the Performance of the String Matching
Classifier
- ? The Performance of the Classifier can be
improved by combining it with the Anomaly Count
Detector - Processes with Anomaly Counts above the
Detection Threshold, are labeled as Anomalous,
regardless of matches with the Anomaly
Dictionaries following this procedure, the 20
of Faults are labeled as Unknown Anomalies. - Anomalies with matches with more than one
Anomaly Dictionary are labeled as Unknown
Anomalies following this procedure, the Unknown
Anomalies A4 and A5 are corrected labeled.
22Summary (Phase I)
Demonstrated the feasibility of using sequences
of OS calls for the classification of Anomalies
effected by Privileged Programs in Unix String
Matching Classifier. Correct classification of
Anomalies allows a more specific response an
important capability for Intrusion Tolerance.
Sequences of systems calls were shown to be
Statistical Signatures for the Anomalies.
Combining the String Matching Classifier with the
Anomaly Count Detector The Anomaly Count
Detector detects Unknown Attacks, while the
String Matching Classifier allows accurate
characterization of Known Attacks.
23Further Work (Phase II)? Towards a Host-Based
System for Classification of Intrusions
- Verify if the Paradigm of Statistical
Signatures holds for other scenarios Audit
Trails in Unix and Windows NT. - Combination of data-based schemes with Domain
Knowledge using Automated Rules to construct
more complete Normal Dictionaries at the level of
OS Calls. - Integration with NMS modules
- At the System and Application Management Level
Using available COTS peripherals to construct a
Host-Based IDS and the attending response
infrastructure. - At the Network Management Level Using the COTS
systems to integrate the outputs of the IDS with
other elements of the Infrastructure. -