The Problem - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

The Problem

Description:

Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, and Anthony Joseph {steve0, anil, nelsonb, quarl, adj}_at_cs.berkeley.edu. University of California at Berkeley ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 15
Provided by: steve1145
Category:
Tags: problem | quarl

less

Transcript and Presenter's Notes

Title: The Problem


1
Learning on User Behavior for Novel Worm Detection
2
Steve Martin, Anil Sewani, Blaine Nelson, Karl
Chen, and Anthony Joseph
steve0, anil, nelsonb, quarl, adj_at_cs.berkeley.ed
u
University of California at Berkeley
3
The Problem Email Worms
  • Email worms cause billions of dollars of damage
    yearly.
  • Nearly all of the most virulent worms of 2004
    spread by email

(source http//www.sophos.com)
4
Current Solutions
  • Signature-based methods are effective against
    known worms only.
  • 25 new Windows viruses a day released during
    2004!
  • Human element slows reaction times.
  • Signature generation can take hours to days.
  • Signature acquisition and application can take
    hours to never.
  • Signature methods are mired in an arms race.
  • MyDoom.m and Netsky.b got through EECS mail
    scanners

5
Statistical Approaches
  • Unsupervised learning on network behavior.
  • Leverage behavioral invariant a worm seeks to
    propagate itself over a network.
  • Previous work novelty detection by itself is not
    enough.
  • Many false negatives worm attack will succeed.
  • Many false positives irritated network admins.
  • Common solution make the novelty detector model
    very sensitive.
  • Tradeoff Introduces additional false positives.
  • Can render a detection system useless.

6
Our Approach
  • Use two-layer approach to filter novelty detector
    results.
  • Novelty detector minimizes false negatives.
  • Secondary classifier filters out false positives.
  • Leverage human reactions and existing methods to
    improve secondary classifier.
  • Use supervisor feedback to partially label data
    corpus
  • Correct and retrain as signatures become
    available
  • Filter novelty detection results with per-user
    classifier trained on semi-supervised data.

7
Per-User Detection Pipeline
8
Pipeline Details
  • Both per-email and per-user features used.
  • User features capture elements of behavior over a
    window of time.
  • Email features examine individual snapshots of
    behavior.
  • Any novelty detector can be inserted.
  • These results use a Support Vector Machine.
  • One SVM is trained on all users normal email.
  • Parametric classifier leverages distinct feature
    distributions via a generative graphical model.
  • A separate model is fit for each user.
  • Classifier retrains over semi-supervised data.

9
System Deployment
10
Using Feedback
  • Use existing virus scanners to update corpus.
  • For each email within last d days
  • If the scanner returns virus, we label virus
  • If the scanner returns clean, we leave the
    current label.
  • Outside prev. d days, scanner labels directly.
  • Threshold number of emails classified as virus to
    detect user infection.
  • Machine is quarantined, infected emails queued.
  • If infection confirmed, i random messages from
    queue are labeled by the supervisor.
  • Model is retrained
  • Labels retained until virus scanner corrects them.

11
Feedback Utilization Process
12
Evaluation
  • Examined feature distributions on real email.
  • Live study with augmented mail server and 20
    users.
  • Used Enron data set for further evaluation.
  • Collected virus data for six email worms using
    virtual machines and real address book.
  • BubbleBoy, MyDoom.u, MyDoom.m, Netsky.d, Sobig.f,
    Bagle.f
  • Constructed training/test sets of real email
    traffic artificially infected with viruses.
  • Infections interleaved while preserving intervals
    between worm emails.

13
Results I
  • Average Accuracy 79.45
  • Training Set 1000 infected emails from 5
    different worms, 400 clean emails
  • Test set 200 infected emails, 1200 clean emails

14
Results II
  • Average Accuracy 99.69
  • Training Set 1000 infected emails from 5
    different worms, 400 clean emails
  • Test set 200 infected emails, 1200 clean emails
Write a Comment
User Comments (0)
About PowerShow.com