MINDS: Data Mining Based Network Intrusion Detection System

About This Presentation

Title:

MINDS: Data Mining Based Network Intrusion Detection System

Description:

Team Members: Eric Eilertson, Paul Dokas, Levent Ertoz, Ben Mayer, ... UMN Computers doing large transfers via BitTorrent to many outside hosts ... – PowerPoint PPT presentation

Number of Views:166

Avg rating:3.0/5.0

Slides: 31

Provided by: bradbl2

Learn more at: https://www-users.cse.umn.edu

Category:

more less

Transcript and Presenter's Notes

Title: MINDS: Data Mining Based Network Intrusion Detection System

1
MINDS Data Mining Based Network Intrusion
Detection System
Vipin Kumar KUMAR_at_cs.umn.edu Army High
Performance Computing Research Center University
of Minnesota http//www.cs.umn.edu/research/min
ds/ Team Members Eric Eilertson, Paul Dokas,
Levent Ertoz, Ben Mayer, Aleksandar Lazarevic,
Michael Steinbach, George Simon, Varun
Chandola, Mark Shaneck, Jaideep Srivastava,
Zhi-Li Zhang, Yongdae Kim, Vipin Kumar
1
2
Information Assurance

Sophistication of cyber attacks and their
severity is increasing
ARL, the Army, DOD and Other U.S. Government
Agencies are major targets for sophisticated
state sponsored cyber terrorists
Cyber strategies can be a major force multiplier
and equalizer
Across DoD, computer assets have been
compromised, information has been stolen, putting
technological advantage and battlefield
superiority at risk
Security mechanisms always have inevitable
vulnerabilities
Firewalls are not sufficient to ensure security
in computer networks
Insider attacks

Incidents Reported to Computer Emergency Response
Team/Coordination Center
Spread of SQL Slammer worm 10 minutes after its
deployment
3
Information Assurance

Intrusion Detection System
Combination of software and hardware that
attempts to perform intrusion detection
Raises the alarm when possible intrusion happens
Traditional intrusion detection system IDS tools
are based on signatures of known attacks
Limitations
Signature database has to be manually revised
for each new type of discovered intrusion
Substantial latency in deployment of newly
created signatures across the computer system
They cannot detect emerging cyber threats
Not suitable for detecting policy violations and
insider abuse
Do not provide understanding of network traffic
Generate too many false alarms

Example of SNORT rule (MS-SQL Slammer
worm) any -gt udp port 1434 (content"81 F1 03 01
04 9B 81 F1 01" content"sock" content"send")
www.snort.org
4
Data Mining for Intrusion Detection

Increased interest in data mining based intrusion
detection
Attacks for which it is difficult to build
signatures
Unforeseen/Unknown/Emerging attacks
Misuse detection
Building predictive models from labeled labeled
data sets (instances are labeled as normal or
intrusive) to identify known intrusions
High accuracy in detecting many kinds of known
attacks
Cannot detect unknown and emerging attacks
Anomaly detection
Detect novel attacks as deviations from normal
behavior
Potential high false alarm rate - previously
unseen (yet legitimate) system behaviors may also
be recognized as anomalies

5
Data Mining for Intrusion Detection
Training Set
continuous
categorical
categorical
temporal

Misuse Detection Building Predictive Models

class

Key Technical Challenges
Large data size
High dimensionality
Temporal nature of the data
Skewed class distribution
Data preprocessing
On-line analysis

Test Set
Learn Classifier
Summarization of attacks using association rules
Anomaly Detection
Rules Discovered Src IP 206.163.37.95, Dest
Port 139, Bytes ? 150, 200 --gt ATTACK
6
Data Mining for Intrusion Detection
Training Set
continuous
categorical
categorical
temporal
Misuse Detection Building Predictive Models
class

Key Technical Challenges
Large data size
High dimensionality
Temporal nature of the data
Skewed class distribution
Data preprocessing
On-line analysis

Test Set
Learn Classifier
Summarization of attacks using association rules
Anomaly Detection
Rules Discovered Src IP 206.163.37.95, Dest
Port 139, Bytes ? 150, 200 --gt ATTACK
7
MINDS Minnesota INtrusion Detection System
MINDS system
Association pattern analysis
Summary and characterizationof attacks
Anomaly scores
network
Detected novel attacks
Anomaly detection

Humananalyst

Net flow tools
tcpdump

Data capturing device
Labels
Known attack detection
Detected known attacks
Feature Extraction
Filtering

Data mining based intrusion detection system
Incorporated into Interrogator architecture at
ARL Center for Intrusion Monitoring and
Protection (CIMP)
Helps analyze data from multiple sensors at DoD
sites around the country
MINDS anomalies are used as the primary key when
viewing related alerts from other tools (SNORT,
Jids, etc.)
MINDS is the first effective anomaly intrusion
detection system used by ARL
Routinely detects attacks and intrusive behavior
not detected by widely used intrusion detection
systems
Insider Abuse / Policy Violations / Worms / Scans

8
Feature Extraction Module

Three groups of features
Basic features of individual TCP connections
source destination IP - Features 1 2
source destination port - Features 3 4
Protocol Feature 5
Duration Feature 6
Bytes per packets Feature 7
number of bytes Feature 8
Time based features
For the same source (destination) IP address,
number of unique destination (source) IP
addresses inside the network in last T seconds
Features 9 (13)
Number of connections from source (destination)
IP to the same destination (source) port in last
T seconds Features 11 (15)
Connection based features
For the same source (destination) IP address,
number of unique destination (source) IP
addresses inside the network in last N
connections - Features 10 (14)
Number of connections from source (destination)
IP to the same destination (source) port in last
N connections - Features 12 (16)

9
Detection of Anomalies on Real Network Data

Anomalies/attacks picked by MINDS include
scanning activities, worms, and non-standard
behavior such as policy violations and insider
attacks. Many of these attacks detected by MINDS,
have already been on the CERT/CC list of recent
advisories and incident notes.
Some illustrative examples of intrusive behavior
detected using MINDS at U of M
Scans
Detected scanning for Microsoft DS service on
port 445/TCP
Undetected by SNORT since the scanning was
non-sequential (very slow). Rule added to SNORT
in September 2002
Detected scanning for Oracle server
Undetected by SNORT because the scanning was
hidden within another Web scanning
Detected a distributed windows networking scan
from multiple source locations
Policy Violations
Identified machine running Microsoft PPTP VPN
server on non-standard ports
Undetected by SNORT since the collected GRE
traffic was part of the normal traffic
Identified compromised machines running FTP
servers on non-standard ports, which is a policy
violation
Example of anomalous behavior following a
successful Trojan horse attack
Detected computers on the network apparently
communicating with outside computers over a VPN
or on IPv6
Worms
Detected several instances of slapper worm that
were not identified by SNORT since they were
variations of existing worm code
Detected unsolicited ICMP ECHOREPLY messages to a
computer previously infected with Stacheldract
worm (a DDos agent)

10
Typical Anomaly Detection Output

January 26, 2003 (48 hours after the slammer
worm)

Anomalous connections that correspond to the
slammer worm
Anomalous connections that correspond to the ping
scan
Connections corresponding to UM machines
connecting to half-life game servers

11
Summarization Using Association Patterns
Ranked connections
attack
Discriminating Association Pattern Generator
Anomaly Detection System
normal
update

Build normal profile
Study changes in normal behavior
Create attack summary
Detect misuse behavior
Understand nature of the attack

R1 TCP, DstPort1863 ? Attack R100 TCP,
DstPort80 ? Normal
Knowledge Base
12
Typical MINDS Output

UM computer connecting to a remote FTP server,
running on port 5002
Summarized TCP reset packets received from
64.156.X.74, which is a victim of DoS attack, and
we were observing backscatter, i.e. replies to
spoofed packets
Summarization of FTP scan from a computer in
Columbia, 200.75.X.2
Summary of IDENT lookups, where a remote computer
tries to get user name
Summarization of a USENET server transferring a
large amount of data

13
Typical MINDS Output

UM computers doing bulk transfers
Attack on Real-Media server (Reported by CERT on
September 9, 2003, RealNetworks media server
RTSP protocol parser buffer overflow)
8200/tcp traffic related to gotomypc.com which
allows users to remotely control a desktop
(involves a third party)
Mysterious traffic currently being investigated

14
Typical MINDS Output

UMN computers doing bulk transfers
160.94.122.142 is running a rogue FTP server on
60000/TCP
UMN Computers doing large transfers via
BitTorrent to many outside hosts
This computer is scanning for computers on port
139/TCP. Majority of the packets are 192bytes or
144bytes, except for the second summary (score
88.2)
UMN computer running a RealMedia server, that was
not known to the analyst
Odd looking P2P traffic to/from a UMN computer
(potentially KaZaA or Gnutella)
The remote computer was scanning for 57/TCP,
where RESET packets are sent back from computers
that do not have 57/TCP open.

15
Scan Detection

Despite the importance of scan detection its
value is often overlooked
Lack of good tools for scan detection
Existing methods either miss stealth scans or
give too many false alarms
Fast scans are easy to catch using existing
schemes but stealth scans are very difficult to
recognize
MINDS employs our new methodology for detecting
network scans
Makes use of powerful new heuristics
Only considers flows with a small number of
packets
Only considers scans in a subnet (not the whole
internet)
Makes effective use of usage information
Touches to rare IP / port combinations are more
suspicious than others
A scanner will hit machines where the service is
not available resulting in a low count
Very low False Alarm rate
Evaluation of 36 million flows over a 30-minute
window at the University of Minnesota showed 2583
alarms but only 22 false alarms
Evaluation on an hour of data at the ARL showed
1150 scans report, but only 5 false alarms
Routinely finds compromised machines at ARL-CIMP

16
Detecting Suspicious Ports for Possible Worm
Activity

We find destinations located within the network
for which there is a high connection failure rate
on specific ports for inbound, non-scan
connections
Then we find ports on which there are many such
destinations
The existence of these ports indicates a
potential worm or slow scan
This warrants targeted and more detailed data
collection and analysis that cannot be done
easily on the entire data
Packet content analysis
Signature generation

17
IP / port pairs for which a large percentage of
connections failed
18
IP / port pairs for which a large percentage of
connections failed (only for ports with many hits)
19
(No Transcript)
20
999 unique sources (Min1, Max28, Avg1) 1126
unique destinations (Min1, Max55, Avg1) 1516
total flows involved 1472 scan flows on port 80
(found by scan detector)
21
(No Transcript)
22
7982 unique sources (Min1, Max16, Avg1) 6184
unique destinations (Min1, Max28, Avg1) 9930
total flows involved 9406 scan flows on port 445
(found by scan detector)
23
(No Transcript)
24
Clustering

Useful for detecting modes of behavior
Shared Nearest Neighbor (SNN) clustering works
quite well at determining modes of behavior
Not distracted by noise in the data
SNN is CPU intensive, O(N2)
Requires storing an N x K matrix
K (number of neighbors) is typically between 10
20
K should be about the size of the smallest expect
mode
Clustered 850,000 connections collected over one
hour at one US Army Fort
Took 10 hours using 3 Quad 2.8 Ghz Servers, and 4
2 Ghz workstations (total of 16 CPUs)
Required around 100 Meg of memory per PE for the
distance calculations
500 Meg of memory for the final clustering step
on a single PE
Found 3135 clusters
Largest clusters around 500 records, smallest
cluster 10 records

25
Detecting Large Modes of Network Traffic Using
Clustering

Large clusters of VPN traffic (hundreds of
connections)
Used between forts for secure sharing of data and
working remotely

26
Detecting Unusual Modes of Network Traffic Using
Clustering

Clusters Involving GoToMyPC.com (Army Data)
Policy violation, allows remote control of a
desktop

27
Detecting Unusual Modes of Network Traffic Using
Clustering

Clusters involving mysterious ping and SNMP
traffic

28
Detecting Unusual Modes of Network Traffic Using
Clustering

Clusters involving unusual repeated ftp sessions
Further investigations revealed a misconfigured
Army computer was trying to contact Microsoft

29
MINDS CRITICAL TO COMPLETE FUNCTIONALITY
MINDS CRITICAL TO COMPLETE FUNCTIONALITY
Scans with Automatic Virus Attacks
Packet-Based Signature Detection
Header Analysis
Behavior Analysis (MINDS)
Viruses and Worms
Simple Scans
Anomaly Detection and New Attacks
New and Variant Attacks
Scans with Target Responses
Compromises
Army Research Laboratory (ARL), supported by the
AHPCRC and the MINDS initiative, successfully
monitors and analyzes network data to protect ARL
and its Army and DoD customer infospace
Session-Based Signature Detection
30
Current MINDS Research and Development Work

Correlation of suspicious events across network
sites
Helps detect sophisticated attacks not
identifiable by single site analyses
Scalable anomaly detection
Distributed correlation algorithms
Grids middleware
Analysis of long term data (months/years)
Uncover suspicious stealth activities (e.g.
insiders leaking/modifying information)