The Problem

About This Presentation

Title:

The Problem

Description:

Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, and Anthony Joseph {steve0, anil, nelsonb, quarl, adj}_at_cs.berkeley.edu. University of California at Berkeley ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 15

Provided by: steve1145

Learn more at: http://oasis.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: The Problem

1
Learning on User Behavior for Novel Worm Detection
2
Steve Martin, Anil Sewani, Blaine Nelson, Karl
Chen, and Anthony Joseph
steve0, anil, nelsonb, quarl, adj_at_cs.berkeley.ed
u
University of California at Berkeley
3
The Problem Email Worms

Email worms cause billions of dollars of damage
yearly.
Nearly all of the most virulent worms of 2004
spread by email

(source http//www.sophos.com)
4
Current Solutions

Signature-based methods are effective against
known worms only.
25 new Windows viruses a day released during
2004!
Human element slows reaction times.
Signature generation can take hours to days.
Signature acquisition and application can take
hours to never.
Signature methods are mired in an arms race.
MyDoom.m and Netsky.b got through EECS mail
scanners

5
Statistical Approaches

Unsupervised learning on network behavior.
Leverage behavioral invariant a worm seeks to
propagate itself over a network.
Previous work novelty detection by itself is not
enough.
Many false negatives worm attack will succeed.
Many false positives irritated network admins.
Common solution make the novelty detector model
very sensitive.
Tradeoff Introduces additional false positives.
Can render a detection system useless.

6
Our Approach

Use two-layer approach to filter novelty detector
results.
Novelty detector minimizes false negatives.
Secondary classifier filters out false positives.
Leverage human reactions and existing methods to
improve secondary classifier.
Use supervisor feedback to partially label data
corpus
Correct and retrain as signatures become
available
Filter novelty detection results with per-user
classifier trained on semi-supervised data.

7
Per-User Detection Pipeline
8
Pipeline Details

Both per-email and per-user features used.
User features capture elements of behavior over a
window of time.
Email features examine individual snapshots of
behavior.
Any novelty detector can be inserted.
These results use a Support Vector Machine.
One SVM is trained on all users normal email.
Parametric classifier leverages distinct feature
distributions via a generative graphical model.
A separate model is fit for each user.
Classifier retrains over semi-supervised data.

9
System Deployment
10
Using Feedback

Use existing virus scanners to update corpus.
For each email within last d days
If the scanner returns virus, we label virus
If the scanner returns clean, we leave the
current label.
Outside prev. d days, scanner labels directly.
Threshold number of emails classified as virus to
detect user infection.
Machine is quarantined, infected emails queued.
If infection confirmed, i random messages from
queue are labeled by the supervisor.
Model is retrained
Labels retained until virus scanner corrects them.

11
Feedback Utilization Process
12
Evaluation

Examined feature distributions on real email.
Live study with augmented mail server and 20
users.
Used Enron data set for further evaluation.
Collected virus data for six email worms using
virtual machines and real address book.
BubbleBoy, MyDoom.u, MyDoom.m, Netsky.d, Sobig.f,
Bagle.f
Constructed training/test sets of real email
traffic artificially infected with viruses.
Infections interleaved while preserving intervals
between worm emails.

13
Results I