Title: Internet Outbreaks Epidemiology and Defenses
1Internet OutbreaksEpidemiology and Defenses
- Geoffrey M. Voelker
- Collaborative Center for
- Internet Epidemiology and Defenses
- Dept. of Computer Science Engineering
- University of California, San Diego
- October 11, 2006
- In collaboration with David Anderson, Jay Chen,
Cristian Estan, Ranjit Jhala, Flavio Junqueira,
Erin Kenneally, Justin Ma, John McCullough, David
Moore, Vern Paxson (ICSI), Stefan Savage, Colleen
Shannon, Sumeet Singh, Alex Snoeren, Stuart
Staniford (Nevis), Amin Vahdat, Erik Vandekeift,
George Varghese, Michael Vrable, Nick Weaver
(ICSI), Qing Zhang
2Paradise Lost
Our Goal Develop the understanding and
technology to address large-scale subversion of
Internet hosts
3Threat Transformation
- Traditional threats
- Attacker manually targets high-value
system/resource - Defender increases cost to compromise high-value
systems - Biggest threat insider attacker
- Modern threats
- Attacker uses automation to target all systems at
once (can filter later) - Defender must defend all systems at once
- Biggest threats software vulnerabilities naïve
users
4Large-Scale Enablers
- Unrestricted high-performance connectivity
- Large-scale adoption of IP model for networks
apps - Internet is high-bandwidth, low-latency
- The Internet succeeded!
- Software homogeneity user naiveté
- Single bug ? mass vulnerability in millions of
hosts - Trusting users (ok) ? mass vulnerability in
millions of hosts - Lack of meaningful deterrence
- Little forensic attribution/audit capability
- Effective anonymity
- No deterrence, minimal risk
5Driving Economic Forces
- Emergence of profit-making payloads
- Spam forwarding (MyDoom.A backdoor, SoBig),
Credit Card theft (Korgo), DDoS extortion, (many)
etc - Virtuous economic cycle transforms nature of
threat - Commoditization of compromised hosts
- Fluid third-party exchange market (millions)
- Going rate for Spam proxying 3 -10
cents/host/week - Seems small, but 25k botnet gets you 40k-130k/yr
- Raw bots, .01/host, Special orders (50)
- Hosts effectively becoming a criminal platform
- Innovation in both host substrate and its uses
- Sophisticated infection and command/control
networks - DDoS, SPAM, piracy, phishing, identity theft are
all applications
6Botnet Spammer Rental Rates
gt20-30k always online SOCKs4, url is de-duped and
updated every gt10 minutes. 900/weekly, Samples
will be sent on request. gtMonthly payments
arranged at discount prices.
- 3.6 cents per bot week
- 6 cents per bot week
- 2.5 cents per bot week
gt350.00/weekly - 1,000/monthly (USD) gtAlways
Online 5,000 - 6,000 gtUpdated every 10 minutes
gt220.00/weekly - 800.00/monthly (USD) gtAlways
Online 9,000 - 10,000 gtUpdated every 5 minutes
September 2004 postings to SpecialHam.com,
Spamforum.biz
7Why Worms?
- All of these applications depend on automated
mechanisms for subverting large numbers of hosts - Self-propagating programs continue to be the most
effective mechanism for host subversion - Prevent automated subversion ? severely undermine
phishing, DDoS, extortion, etc. - Our Goal Develop the understanding and
technology to address large-scale subversion of
Internet hosts
8Today
- Worm outbreaks
- What are we up against?
- Framing the worm problemand solutions
- What can we do?
- Two worm detection and monitoring techniques
- Fundamental basis for understanding of and
defense against large-scale Internet attacks - EarlyBird High-speed network-based content
sifting - Potemkin Large-scale high-fidelity honeyfarm
- Summarize
9Network Telescopes
- Idea Unsolicited packets evidence of global
phenomena - Backscatter response packets sent by victims
provide insight into global prevalence of DoS
attacks (and who is getting attacked) - Scans request packets can indicate an infection
attempt from a worm (and who is current infected,
growth rate, etc.) - Very scalable CCIED Telescope monitors 17M IP
addrs (gt 1 of all routable addresses of the
Internet)
10Worm Outbreaks
- CodeRed worm released in July 2001
- Exploited buffer overflow in Microsoft IIS
- Infects 360,000 hosts in 14 hours (CRv2)
- Propagation is limited by latency of TCP handshake
Moore et al, CodeRed a Case study on the Spread
of an Internet Worm, IMW 2002 andStaniford et
al, How to 0wn the Internet in your Spare Time,
USENIX Security 2002
11Fast Worms
- Slammer/Sapphire released in January 2003
- First 1 min behaves like classic scanning worm
- Doubling time of 8.5 seconds
- gt1 min worm saturates access bandwidth
- Some hosts issue gt 20,000 scans/sec
- Self-interfering
- Peaks at 3 min
- gt55 million IP scans/sec
- 90 of Internet scanned in lt10 mins
Moore et al, The Spread of the Sapphire/Slammer
Worm, IEEE Security Privacy, 1(4), 2003
12Was Slammer really fast?
- Yes, it was orders of magnitude faster than
CodeRed - No, it was poorly written and unsophisticated
- Who cares? It is literally an academic point
- The current debate is whether one can get lt 500ms
- Bottom line way faster than people!
Staniford et al, ACM WORM, 2004
13Understanding Worms
- Worms are well modeled as infectious epidemics
- Homogeneous random contacts
- Classic SI model
- N population size
- S(t) susceptible hosts at time t
- I(t) infected hosts at time t
- ? contact rate
- i(t) I(t)/N, s(t) S(t)/N
Staniford, Paxson, Weaver, How to 0wn the
Internet in Your Spare Time, USENIX Security 2002
14What Can We Do?
- 1) Reduce number of susceptible hosts S(t)
- Prevention
- 2) Reduce number of infected hosts I(t)
- Treatment
- 3) Reduce the contact rate ?
- Containment
- 4) Prepare for the inevitable
- Survival
15Prevention
- Reduce of susceptible hosts S(t)
- Software quality eliminate vulnerability
- Static/dynamic testing e.g., Cowan, Wagner,
Engler - Active research community, taken seriously in
industry - Security code review alone for Windows Server
2003 200M - Traditional problems soundness, completeness,
usability - Software updating reduce window of vulnerability
- Most worms exploit known vulnerability (10 days ?
6 months) - Sapphire Vulnerability patch July 2002, worm
January 2003 - Some activity (Shield Wang04), yet critical
problem - Is finding security holes a good idea?
Rescorla04 - Software heterogeneity reduce impact of
vulnerability - Artificial heterogeneity Forrest02
- Exploit existing heterogeneity Junqueira05
16Treatment
- Reduce of infected hosts I(t)
- Disinfection Remove worm from infected hosts
- Develop specialized vaccine in real-time
- Distribute at competitive rate
- Counter-worm, anti-worm
- Code Green, CRclean, Worm vs. Worm Castaneda04
- Exploit vulnerability, patch host, propagate
- Seems tough
- Legal issues of using exploits, even if
well-intentioned - Propagation race problem
- Automatically patch vulnerability Keromytis03,
Sidiroglou05 - Auto-generate and test patches in sandbox
- Apply within administration domain
- Requires source, targets known exploits (e.g.,
overflows)
17Reactive Containment
- Reduce contact rate ?
- Slow worm down
- Throttle connection rate to slow spread
Twycross03 - Important capability, but worm still spreads
- Quarantine
- Detect and block worm
- How feasible is it?
18Defense Requirements
- Any reactive defense is defined by
- Reaction time how long to detect worm,
propagate information, and activate response - Containment strategy how malicious behavior is
identified - Deployment scenario who participates in the
system - Given these, what are the engineering
requirements for any effective defense? - Moore et al., Internet Quarantine
Requirements for Containing Self-Propagating
Code, Infocom 2003
19Containment Requirements
- Universal deployment for Code Red
- Address filtering (blacklists), must respond lt 25
mins - Content filtering (signatures), must respond lt 3
hours - For faster worms (slammer) seconds
- Worse for non-universal deployment
- Bottom line very challenging
Reaction time
Propagation rate (probes/sec)
20Survival
- Prepare for inevitable
- For the ones that surprise us
- Approach Informed replication
- Worms represent large-scale dependent failures
- Model software configurations ? model dependent
failures - Replicate data on hosts with disjoint
configurations - Exploit existing software heterogeneity
- Even with software skew, only need 3 replicas
- Phoenix
- Cooperative backup system using informed
replication - Junqueira et al., Surviving Internet
Catastrophes, USENIX 2005
21Scalable Detection and Monitoring
- Detection and monitoring are fundamental for
understanding of and defense against worms - Lessons from containment
- Need to detect worms in less than a second
- How can we do this?
- Know thy enemy
- What does the worm/virus/bot do?
- Who is controlling it?
22Signature Inference
- Challenge In less than a second
- Detect worm probes
- Characterize worm packets with a byte signature
- Approach
- Monitor network, identify packets with common
strings spreading like a worm - Use signature for content filtering
Packet Header
SRC 11.12.13.14.3920 DST 132.239.13.24.5000
PROT TCP 0110 90 90 90 90 FF 63 64 90 90 90
90 90 90 90 90 90 .....cd.........0120 90 90 90
90 90 90 90 90 90 90 90 90 90 90 90 90
................0130 90 90 90 90 90 90 90 90 EB
10 5A 4A 33 C9 66 B9 ..........ZJ3.f.0140 66 01
80 34 0A 99 E2 FA EB 05 E8 EB FF FF FF 70
f..4...........p
Packet Payload (Content)
23Content Sifting
- Assume unique, invariant string W for all worm
probes - Works today, but not forever
- Consequences
- Content prevalence W more common in worm traffic
- Address dispersion traffic with W has many
distinct src/dests - Content Sifting
- Identify W with high prevalence and high
dispersion - Use W as filter signature in network
- Singh et al., Automated Worm Fingerprinting,
OSDI 2004
24Content Sifting in Early Bird
- Challenges Time and space
- Must touch every byte in all packets (1 Gbps ? 12
us/packet) - Simple algorithm consumes 100 MB/s of memory
- Approach Careful algorithms and data structures
- Incremental hash functions
- Value-based sampling
- Multi-state filters and multi-resolution and
counting bitmaps - Combined 60 us/packet in software
- Works well in practice
- Deployed at UCSD CSE for 8 months
- Detected every worm outbreak reported on security
lists - Identified unknown worms (Kibvu, Sasser)
-
25Tech Transfer
- Content sifting technologies patented by UC and
licensed to startup, Netsift Inc. - Netsift significantly improved performance,
features - Hardware implementation, new capabilities
- In June 2005, Netsift was acquired by Cisco
26Going Further
- Network telescopes, content sifting have
limitations - Passive observation, no interaction with malware
- Lexical domain is limited
- Evasion through polymorphism, protocol framing,
encryption - Want to answer deeper questions
- What does a worm/virus/bot do?
- What vulnerabilities are exploited, and how?
- Who is controlling it, how is it controlled?
- Alternative Endpoint monitoring
27Active Responders
- Problem Telescopes are passive, cannot respond
to TCP handshake - Is a SYN from a host infected by CodeRed or
Welchia? Dunno. - What does the worm payload look like? Dunno.
- Solution Proxy responder
- Stateless TCP SYN/ACK (Internet Motion Sensor),
per-protocol responders (iSink) - Stateful Honeyd
- Can differentiate and fingerprint payload
- False positives generally low since no regular
traffic
28Honeypots
- Problem Poor fidelity
- Do not know what worm/virus would dono code
downloaded - What bot code would be downloaded? Where from?
What control channels? - Direct traffic to real infectable hosts
(honeypots) - Individual hosts or VM-based Collapsar,
HoneyStat, Symantec - Can reduce false positives/negatives with host
analysis (e.g., Vigilante, TaintCheck, Minos)
and behavioral/procedural signatures - Challenges
- Scalability ()
- Liability (grey legal areas)
- Isolation (control for inter-malware competition)
- Detection (VMWare detection code in the wild)
29Scalability/Fidelity Tradeoff
Telescopes Responders (iSink, honeyd, Internet
Motion Sensor)
VM-based Honeynet (e.g., Collapsar)
Network Telescopes (passive)
Live Honeypot
Highest Fidelity
Most Scalable
30Can We Achieve Both?
- Naïve approach one machine per IP address
- 1M addresses 1M hosts 2B investment
- Overkill most resources will be wasted
- In truth, only necessary to maintain the illusion
of continuously live honeypot systems - Maintain illusion on the cheap using
- Network multiplexing
- Host multiplexing
31Network Multiplexing
- Most addresses are idle at any given time
- Late bind honeypots to IP addresses
- Most traffic does not cause an infection
- Recycle honeypots if do not detect anything
interesting - Only maintain honeypots of interest for extended
periods - Scalability is related to both the workload and
the recycling rate
32Net Multiplexing Efficiency
Only need one honeypot for every 100-1000 IP
addresses
33Host-level multiplexing
- CPU utilization in each honeypot is quite low
(ltlt1) - Use VMM to multiplex honeypots on single machine
- Done in practice, but limited by memory
bottleneck - Exploit memory coherence property
- Few memory pages are actually modified in input
- Share unmodified pages among VMs
- Scalability function of unique memory per VM
34Host multiplexing efficiency
Only need one physical machine for every
100-1000 honeypots
35Potemkin A High-Fidelity, Large-Scale Honeyfarm
- Gateway Multiplexes traffic onto VM honeypots
- Potemkin VMM Multiplexes VMs on servers
Vrable et al., Scalability, Fidelity, and
Containment in the Potemkin Virtual Honeyfarm,
SOSP 2005.
36Potemkin Gateway
- Gateway terminates inbound GRE tunnels
- Maintains external IP address ? type mapping
- 132.239.4.8 should be a Windows XP box w/IIS
version 5, etc. - Mapping made concrete when packet arrives
- Flow entry created and packet dispatched to
type-compatible physical host - VMM on host creates new VM with target IP address
- VM and flow mapping GCd after system determines
that an interaction is uninteresting (detectors)
37Potemkin VMM
- Modified Xen using shadow translate mode
- Integrated into VT for Windows support
- Clone manager instantiates frozen VM image and
keeps it resident in physical memory - Flash cloning memory instantiated via eager copy
of PTE pages and lazy faulting of data pages (no
software startup) - Delta virtualization copy implemented as
copy-on-write (no memory overhead for shared
code/data) - Supports hundreds of simultaneous VMs per host
- Overhead currently takes 200-500ms to create new
VM - Imperceptible to human user and under TCP
handshake timeout - Wildly unoptimized (e.g., includes multiple
Python invocations) - Pre-allocated VMs can be invoked in 5ms
38Containment
- Key issue 3rd party liability and contributory
damages - Honeyfarm worm accelerator
- Worse I knowingly allowed my hosts to be
infected (premeditated negligence and outside
best-practice safe harbor) - Export policy tradeoffs between risk and fidelity
- Block all outbound packets no TCP connections
- Only allow outbound packets to host that
previously send packet no outbound DNS, no
botnet updates - Allow outbound, but scrub is this a best
practice? - In the end, need fairly flexible policy
capabilities - Could do whole talk on interaction between
technical legal drivers
39Summary
- Internet hosts are highly vulnerable to worm
outbreaks - Millions of hosts can be taken before anyone
realizes - Supports vibrant ecosystem of criminal activity
- Containment (Quarantine) requires automated
response - Prevention is a critical element, but outbreaks
inevitable - Need scalable detection, can also plan to survive
(Phoenix) - Different detection strategies, monitoring
approaches - High-speed network-based content sifting
(EarlyBird) - Large-scale high-fidelity honeyfarm (Potemkin)
- Smart bad guys still have a huge advantage
- Escalation Rapid innovation in both problems and
solutions
40Many Ongoing CCIED Projects
- Worm forensics with honeyfarm
- Network dynamics
- Network and host support
- Automated botnet identification and analysis
- Self-moderating outbreaks (get 80 and stop)
- Automated outbreak dynamics estimation
- How many hosts are susceptible, when will
outbreak peak, etc? - Automated vulnerability signature generation
- At state-machine level (w/MSR)
- Automated exploit behavioral categorization
- Prevalence of polymorphism in exploits
- Honeypot camouflage
- Panel testing
- Normalize traffic and replay against range of
reference images
41For More Info
42(No Transcript)
43Why Internet Epidemics?
- Why is this a big issue now?
- Didnt we have worms and viruses back in the 80s?
- Isnt the real problem now?
- Isnt all this Internet threat stuff overhyped?
- My computer continues to work after all
spyware
phishing
botnets
something new
44What Service-Oriented Computing Really Means
45CCIED (Seaside)
- Joint NSF CyberTrust Center Project
- Collaboration between UCSD and ICSI in Berkeley
- 5-year effort 10/200410/2009
- Industrial support Microsoft, Intel, HP, VMware,
ATT - Goal Develop understanding and technology to
address the threats of large-scale host
compromise - DDoS, worms, virus, botnets,
- Folks
- Stefan Savage, George Varghese,
- Geoff Voelker (UCSD)
- Vern Paxson, Nick Weaver (ICSI)
- 25 students staff
46Major Research Efforts
- Internet Epidemiology Understanding
- What kinds of new attacks are going on?
- How are they controlled? How do they behave? What
are their side-effects? What are their limits? - Automated Network Defenses Reacting
- Stop new attacks without humans in the loop
- Forensic, Legal and Economic issues Deploying
- What investigatory and evidentiary value can we
provide with technology? What is the legal
framework for safely using the technologies we
developed? How do we create the proper
incentives for defense deployment?
47Internet Epidemiology
- Inferring Internet Denial-of-Service Activity
- 4,000 DoS attacks/week, everyone a victim,
intense, periodic
Moore et al., Inferring Internet Denial of
Service Activity, USENIX Security, 2001
48From Model to Defense
- Prevention Reduce the number of susceptible
hosts - Reduce S(t) while I(t) is still small (ideally
reduce S(0)) - Software quality, wrappers, artificial
heterogeneity, patching, known exploit blocking,
hygiene enforcement - Treatment Reduce the number of infected hosts
- Reduce I(t) after the fact (clean up)
- Counter worms, anti-worms, automatic patching
- Containment Reduce the contact rate
- Reduce ß while I(t) is still small
- Proactive Slow worm down (e.g., tarpits)
- Reactive Detect and block
49Automated Network Defense
- Internet Quarantine
- Automated response, content filtering, global
deployment
Content Filtering
Top 100 ISPs
Infected (95th perc.)
Reaction time
Reaction time (hours)
Propagation rate (probes/sec)
Moore, Shannon, Voelker, and Savage. Internet
Quarantine Requirements for Containing
Self-Propagating Code. IEEE Infocom 2003
50Worm Signature Inference
- Challenge Need to automatically learn a content
signature for each new worm (in less than a
second!) - Approach Monitor network and look for strings
common to traffic with worm-like behavior - Signatures can then be used for content filtering
51Content Sifting
- Assume there exists some (relatively) unique
invariant bitstring W across all instances of a
particular worm - Two consequences
- Content Prevalence W will be more common in
traffic than other bitstrings of the same length - Address Dispersion the set of packets containing
W will address a disproportionate number of
distinct sources and destinations - Content sifting find Ws with high content
prevalence and high address dispersion and drop
that traffic
Singh, Estan, Varghese and Savage, Automated Worm
Fingerprinting, OSDI 2004
52EarlyBird
- Detected and automatically generated signatures
for every known worm outbreak over eight months - Low latency (us), high-bandwidth (s/w at 200
Mbps) - Transferred to startup NetSift, Inc. ? Purchased
by Cisco
53Honeyfarms
- Honeypots serve as live sandboxes for exploit
- Examine active exploit in controlled environment
- Multiple honeypots form a honeyfarm
- Challenge Want scalability fidelity
- Scalability monitor 1 million IP addresses
- Fidelity new machine for each connection
- Observation
- In truth, only necessary to maintain the illusion
of continuously live honeypot systems
54Approach
- Network-level multiplexing (102 103
scalability) - Most addresses are idle at any given time
- Late bind honeypots to IP addresses
- Most traffic does not cause an infection
- Recycle honeypots if cant detect anything
interesting - Only maintain honeypots of interest for extended
periods - Host-level multiplexing (another 102 103)
- CPU utilization in each honeypot is quite low
(ltlt1) - Use VMM to multiplex honeypots on single machine
- Done in practice, but limited by memory
bottleneck - Memory coherence property
- Few memory pages are actually modified in input
- Share unmodified pages between VMs copy-on-write
55Yes, we can do Windows...
56Overall challenges for honeyfarms
- Depends on asynchronous input
- What if they dont scan that range (smart bias)
- What if they propagate via e-mail, IM? (doable,
but privacy issues) - Inherent tradeoff between liability exposure and
detectability - Honeypot detection software exists perfect
virtualization tough (although were working hard
on it) - Resource exhaustion (from outbreak or DoS)
- It doesnt necessary reflect whats happening on
your network (cant count on it for local
protection) - Hence, there is a need for both honeyfarm and
in-situ approaches