Title: Cyber-TA: Massive and Distributed Data Correlation
1Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Cyber-TA Massive and Distributed Data
Correlation
Phillip Porras - porras_at_csl.sri.com Computer
Science Laboratory, SRI International www.cyber-ta
.org 28 September 2006
2Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Massive Data Correlation Group
- Massive Data Correlation Group Examining
strategies to collect and analyze local network
events in search of large-scale attack
phenomena, emerging malware threats, stealth
activity across large-scale networks - Contributors SRI, Yale, SANS Institute, NCSU, UC
Davis, GA-Tech, and others - Perspectives
-
- Massive/Passive Analysis Methods Examining
large-scale data correlation strategies to apply
in incoming security log data from the repository - Data utility requirements for data privacy
services - Optimal data sources
- New (and current) correlation strategies must
address data anonymization - Distributed Analysis Methods Distribute
attack detection logic to producers, collect
results abstractions and conduct group consensus
analyses -
3Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Data Analysis Approaches
- Massive/Passive Analysis Methods
- low-rate (Stealth) pattern/sequence detection
in massive data stores - massive privacy-preserving data mining
strategies (Massive PPDM) - fast entropy-shift detection in high-volume data
streams - Highly-Predictive Blacklist (HPB) production
- Distributed Analysis Methods
- producer-side behavior-based malware correlation
(botHunter v0.9) - summary statistics, consensus attack detection
and trend analyses
4Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Isolating Stealthy Actions in Massive Data Volumes
- Objective Stealth def in this context
seeking long-duration or short-sequence
deterministic behavior patterns in massive data
streams - Current Detection Methods lack computational
and memory efficiency in processing massive data
stores - Current coordinated attack discovery (e.g.,
attack collaboration) have not been applied in
repository-scale applications - We seek data pruning techniques, optimal data
attribute selections that will facilitate various
deterministic behavior pattern analyses - Low-speed scanning, common malware communication
patterns, long-duration propagation analyses,
regularities in IDS Log production patterns that
indicate detection redundancies - Employ massive-data analysis techniques in areas
such as streaming algorithmics, very-large
databases, and distributed data mining
5Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Example Low-density pattern analyzer port N-Grams
- Provides a basis upon which
- Automated discovery of emerging malware scan
patterns - Local Systems can be compared to global N-Gram
patterns
300M connection over A 56K unused IP
FOUND On days 1-3 there were 160-200 sources
per day probed the following 10 port
combination (All MS B.O. Targets)
80135139445102514332745312750006129
1-2-3 195-200-160
Dst_Port N-Grams
Common SRC_IP cnts
1433
135
8013513944510252745312750006129
80139445102514332745312750006129
2 1
4 12
0080 Web Server 0135 MS DCE Locator Service
(DHCP, DNS, WINS) 0139 MS NetBios 0445 MS
Win2K SMB 1025 CAN-2003-0533 MS LSASRV.DLL
B.O 1433 MS SQL-Server B.O. 2745 MS Bagle
Virus Backdoor 3127 MS MyDoom Backdoor 5000
BioNet, Bubble, Blazer, ICKiller Backdoors 6129
MS Dameware Remote Admin
80
139445102514332745312750006129
4 22
6Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Massive PPDM Strategies
- Current PPDM Methods
- Peer-based shared encryption scheme (e.g.,
homomorphic encryption) - Example Capabilities
- Privacy Preserving Set Intersection All parties
want the intersection of their private datasets
revealed, without gaining/revealing
non-intersecting data - Privacy Preserving Set Matching Each member Pi
wants to know which values in its set intersect
with values of the other members set, without
gaining/revealing non-matchers - Solutions are traced to 2-party case of private
equality testing, among other techniques - Massive PPDM
- PPDM in non-peer-based environments (e.g.,
large-scale sensor grids) - PPDM computational scalability and lightweight
key coordination schemes - Usage Concept N coalition partners wish to
compare netflow/intrusion/FW logs to find common
attack sources insufficient trust to openly
share unrelated connection histories
7Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Massive Data Efficient Change/Shift Detection
Entropy LETS TALK
8Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Highly-Predictive Blacklisting (HPB) - Concept
Sensor Repository
- S. Katti, B. Krishnamurthy, D. Katabi,
Collaborating Against Common Enemies, ACM
SIGCOMM05 Internet Measurement Conference. - Surveyed data from 1700 DShield Sensors
- Introduced Highly Collaborative Groups
- Relative small membership sizes
- Correlated attacks appear at corr_group networks
within small time frames - Groups relations are long lasting
- Cross group relations have small intersections
- Implications
- blacklist sharing among groups may yield higher
relevance rates, more managable sizes
9Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
Contributor Pool Cluster Details
- Clustering Logic
- Each node corresponds to a /24 subnet.
- Different colors represent different prefixes.
- Two nodes are connected if more than 10 of the
attacks target one nodes also go to the other. - The nodes in the clusters are highly connected
while there is little or no connection between
nodes in different clusters.
10Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
Massive Data Correlation Data Analysis
Approaches Stealth Threats Massive PPDM
Shifts and Spikes Highly Predictive
Blacklists Distributed Correlation Techniques
HPB Example Data Assessment
- Clusters are constructed using day ones alert
reports - On day one
- attackers observed by the repository 976,997
- attackers observed by the cluster 10,106
- On day two
- over 50 of the attackers seen by any node in the
cluster can be predicted by day ones observation
from the cluster
Day two attack
Day one repository observation
Day one attack to the cluster
11Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
- bØtHunt3r
- A behavior-based correlation framework
- for botnet detection
12What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
What is botHunter?
botHunter is a passive bot detection system,
consisting of
- Snort-based sensor suite specialized in
malware-specific event detection -
- malware-specific inbound scan detection using TRW
variant - comprehensive remote to local exploit detection,
emphasizing most common methods - PAYL-based session anomaly detection system
detecting payload exploits over key TCP protocols - Botnet specific egg download banners, bot
registration acknowledements - Victim-to-CC-based communications exchanges,
particularly for IRC bot protocols - inbound to outbound scan monitoring system
- Cyber-TA-based plugin correlator
- combines information from sensors to recognize
bots that infect and coordinate with your
internal network assets - Submits bot-detection profiles to the Cyber-TA
repository infrastructure
13Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
Bot infection case study Phatbot
- An example infection lifecycle of the Phatbot
infection captured in a controlled VMWare
environment - A Attack, V Victim, C CC Server
- E1 A. ? V.2745, 135, 1025, 445, 3127, 6129,
139, 5000 (Bagle, DCOM2, DCOM, NETBIOS, DOOM,
DW, NETBIOS, UPNPTCP connections w/out content
transfers) - E2 A. ? V.135 (Windows DCE RCP exploit in
payload) - E3 V. ? A.31373 (transfer a relatively large
file via random A port specified by exploit) - E4 V. ? C.6668 (connect to an IRC server)
- E5 V. ? V.2745, 135, 1025, 445, 3127, 6129,
139, 5000 (V begins search for new infection
targets, listens on 11759 for future egg
downloads)
14Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
A Behavioral-based Approach
V-2-A
botHunter abstracts the infection lifecycle into
5 possible stages
A-2-V
V-2-
Type II
V-2-C
A-2-V
- Search for duplex communication sequences that
are indicative of infection-coordination-infection
lifecycle
Type I
V-2-
- Under a weighted correlation scheme, external
stimulus is not enough to declare bot - stimulus does not require strict ordering, but
does require temporal locality
15Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
Botnets Architecture Overview
System Requirements Snort 2.6.0, OS Linux,
MacOS, Win, FreeBSD, Solaris,
Java 1.4.2
Snort 2.6.0
spp_scade.ch
e2 Payload Anomalies
CTA Anonymizer Plugin
SLADE
e1 Inbound Malware Scans
botHunter Correlator
Span Port to Ethernet Device
spp_scade.ch
e5 Outbound Scans
SCADE
e2 Exploits e3 Egg Downloads e4 CC Traffic
Java 1.4.2
Signature Engine
- bot Infection Profile
- Confidence Score
- Victim IP
- Attacker IP List (by confidence)
- Coordination Center IP (by confidence)
- Full Evidence Trail Sigs, Scores, Ports
- Infection Time Range
16botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensor Suite SCADE
SCADE ./snort-2.6.0/src/preprocessors/spp_scade.c
- Custom malware specific weighted scan detection
system for inbound and outbound sources
- Inbound (E1 Initial Scan Phase)
- suspicious port scan weighted TRW score
- failed connection to vulnerable port high
weight - failed connection to other port median weight
- successful connection to vulnerable port low
weight - Outbound (E5 Victim Outbound Scan)
- S1 Scan rate of V over time t
- S2 Scan failed connection rate of V over t
- S3 Scan target entropy (low revisit rate
implies bot search) over t - Majority voting scheme employed combines model
assessments
17botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensor Suite SLADE
SLADE ./snort-2-6.0/src/preprocessors/spp_slade.c
- Suspicious payload detect Modified PAYL 3-gram
byte distribution analyzer over a limited set of
network services - Implements a lossy data structure to capture
3-gram hash space default vector size 2048.
(Versus n3, 2563 224 16M). - Current Slade port set 21, 53, 80, 135, 1025,
445 TCP - Auto-transition from train to detect mode
enabled - Current Status in develop to enable per-port
auto-threshold selection
18botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensor Suite Signature Engine
- botHunter Signature Set Replaces all standard
snort rules with five custom rulesets
e1-5.rules - Scope known worm/bot exploit general traffic
signatures, shell/code/script exploits,
update/download/registered rules, CC command
exchanges, outbound scans and malware exploits - Rule sources
- Bleeding Edge malware rulesets
- Snort Community Rules
- Snort Registered Free Set
- Cyber-TA Custom bot-specific rules
- Current Set 237 rules, operating on SRI/CSL and
GA-Tech networks, relative low false positive rate
19Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
botHunter - Correlation Framework
- Characteristics of Bot Declarations
- states are triggered in any order, but pruning
timer reinitializes row state once an InitTime
Trigger is activated - external stimulus alone cannot trigger bot alert
- 2 x internal bot behavior triggers bot alert
- When bot alert is declared, IP addresses are
assigned responsibility based on raw contribution
Bot-State Correlation Data Structure
VictimIP E1 E2 E3 E4 E5
Score
Rows Valid Internal Home_Net IP Colums Bot
infection stages Entry IP addresses that
contributed alerts to E-Column Score Column
Cumulative score for per Row Threshold
(row_score gt threshold) ? declare bot InitTime
Triggers An event that initiate pruning
timer Pruning Timer Seconds remaining until a
row is reinitialized
Defaults E1 Inbound scan detected
weight .25 E2 Inbound exploit detected
weight .25 E3 Egg download detected
weight .50 E4 CC channel detected
weight .50 E5 Outbound scan detected
weight .50 Threshold 1.0 Pruning Interval
120 seconds
20Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
Implementation Status and Example Output
./Run_botHunter.csh c ./config/phatbot.config S
tarting program... Score 1.5 (gt 1.0) Infect
Target 192.168.166.40 Infector
List 192.168.166.20 C C List 192.168.166.10
(25), 192.168.166.20 (3) Start 06/22/2006
164223.33 PDT Report End 06/22/2006
164438.54 PDT INBOUND SCAN 192.168.166.20
(164223 PDT) E1 scade detected host
192.168.166.40 scanned by 192.168.166.20 at
ports 2745 3127 6129 EXPLOIT
192.168.166.20 (2) (164224.67 PDT) E2
SHELLCODE x86 NOOP 135lt-4819 (164224.67
PDT) E2 SHELLCODE x86 0x90 unicode NOOP
135lt-4819 EGG DOWNLOAD C and C TRAFFIC
192.168.166.10 (25) (164241.34 PDT-164331.20
PDT) E4 COMMUNITY BOT Internal IRC server
detected E4 BLEEDING-EDGE TROJAN BOT - potential
scan/exploit command 1037lt-6668 E4 COMMUNITY BOT
GTBot scan command 1037lt-6668 OUTBOUND SCAN
192.168.166.20 (164346.85 PDT) E5 scade
detected suspicious scanner 192.168.166.40
scanning 30 IPs at ports 0 2745
Example VMWare Phatbot Experiment
Coordination Center 192.168.166.10 Initial
Bot Infector 192.168.166.20 Victim System
192.168.166.40
21Introduction Approaches to Privacy-Preserving
Correlation A Cyber-TA Distributed Correlation
Example botHunter
What is botHunter? A Real Case Study Behavior-base
d Correlation Architectural Overview
botHunter Sensors Correlation Framework Example
botHunter Output Cyber-TA Integration
botHunter - born a Cyber-TA plugin
Cyber-TA Threat Ops Center
CTA Anonymizer Plugin
Snort Alerts
botHunter Correlator
Bot Profile Repository
Java 1.4.2
AnonymizationService
Cyber-TA RDBMS Manager
MIXNET Deliver Daemon
Delivery Ack
Delivery Ack
CTA Anonymizer
TLS Session
TLS Session
TOR Circuit
TOR Circuit
TCP/IP
TCP/IP
22