Title: On the Session Structure Of Network Applications
1On the Session Structure Of Network Applications
- Jayanthkumar Kannan, UC Berkeley
- Jaeyeon Jung, MIT
- Vern Paxson, ICSI, LBL
- Can Emre Koksal, EPFL
2Motivation
- Internet traffic is far weirder than any network
researcher can ever imagine - Paxson, 99 Why understanding anything about
the Internet is painfully hard - Many characteristics of traffic have heavy-tailed
distributions - Leads to operational difficulty
- Several hundred Internet applications in use
(especially in enterprise and universities) - Hard for administrators to identify the
applications in use - May use it for specifying policies
- Makes measurement and analysis harder
- Need complex statistical models to capture
network behavior
3Our work
- Goal
- Characterize the connection-level behavior of
applications - Session
- Set of connections initiated/received by an
application in response to an user event - Example FTP session consists of a FTP control
connection followed by multiple data connections - Problem definition
- Input Connection-level traces at a firewall
- Infer the session structure of common
applications with minimal human input
4The Big Picture
- Two main pieces
- Identifying application sessions.
- Eg (ftp, ftp-data), (ftp,ftp-data,ftp-data,ftp-da
ta) - Inferring typical sessions structure
- Eg FTP (ftp) (ftp-data)
- Application Identify anomalous activity
- Eg new apps, misconfigurations, malicious attacks
Connections
Sessions
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
5Outline
- Session Identification
- Inferring Session Structure
- Preliminary Results
- Application Identifying Anomalous Activity
- Conclusion
Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
6Session Identification
- Purpose
- Given a stream of connections, parse it into
sessions - Observations
- The connections in a session are causally related
- Such connections tend to occur close to each
other - Devised a statistical test
- Identifies pairs of causally linked connections
- Collate causally linked connections into sessions
- Builds a base model of what is normal, and flags
deviations
7Session Identification Base Model
- Maintain a model for the arrival times of
connections - Categorize connections by type and maintain rate
estimates for the arrival of each type - Eg Rate of (ltftp) 10 per hr
- Empirically known fact
- This arrival model is roughly stationary Poisson
over duration of a hour - Arrival process of unrelated types of connections
is union of independent Poisson processes - Avoids the usual stumbling block in modeling
studies
8Session Identification Deviations
- Consider connections of two types T1, T2
- Define PR1,R2,x as probability that two
independent processes of rates R1,R2 have an
arrival within time x - Test If PR1,R2,x lt T, declare C1, C2 in same
session - For poisson arrivals, PR1,R2,x R1 R2 x
- Tunable parameters Threshold T, Timeout values
- False positives per unit time is at most T
- False negatives Experimentally evaluated
- Note that we will see several instances of same
application
C1
(ltftp)
Rate R1
Rate R2
(gthttp)
C2
9Outline
- Session Identification
- Inferring Session Structure
- Preliminary Results
- Application Identifying Anomalous Activity
- Conclusion
Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
10Inferring Session Structure Representation
- What language do you use to represent an
application session? - Our decision Regular Expressions
- Divined based on experience with applications
- Eg FTP (ltftp) ((gtftp-data)(ltftp-data))
- Problem
- Given various types of observed app sessions,
output a regular expression that captures typical
behavior of that app
11Inferring Session Structure Framework
- Naive approach
- Build a regexp that exactly matches the list of
observed sessions - Eg (ltftp, gtftp-data), (ltftp, ltftp-data,
ltftp-data) - Challenges
- Such regular expressions are extremely
complicated - Need to generalize beyond the trace since all
types of sessions may not be observed - Need to deal with false positives of session
identification - Our approach
- Build a exact DFA E based on trace
- Design generalization rules and provide a set of
generalized DFAs with associated FP, FN - Human makes final decision based on simplicity of
DFA, FP, FN
12Inferring Session Structure Generalization
- Prefix Rule
- If session S is observed in trace, assume all
prefixes of S are legal app sessions - Counting Rule
- If (a bm), (a bn) occur in trace, assume (a b)
are legal application sessions - Pruning Rule
- Given a DFA, retain states and edges that are
required to match k of the trace (kcoverage) - 2 other rules
- Invert Direction, Dynamic Port
13Outline
- Session Identification
- Inferring Session Structure
- Preliminary Results
- Application Identifying Anomalous Activity
- Conclusion
Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
14Results Setup
- Test data
- Connection-level traces from Feb 05
- LBL (7879 hosts, three million connections a day)
- ICSI (272 hosts, hundred thousand connections a
day) - Parameter choice
- Threshold for statistical test T 0.01
- Timeout values
- Coverage for Pruning k 99
- Identified over 28 applications at LBL
- Heuristics designed by verifying that output over
trace matches published protocol information - Ongoing Validate heuristics over fresh traces
15Results
16Outline
- Session Identification
- Inferring Session Structure
- Preliminary Results
- Application Identifying Anomalous Activity
- Conclusion
Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
17Application Identifying Anomalies
- Approach
- Run causality detection test to find causally
related connections - Identify causal links that do not correspond to
list of inferred session structure - Use some training to avoid reporting unusual
sessions seen frequently at a single host
18Identifying Anomalies Results
- Identified several activities of interest to
admins - About 10 alarms a day
- Uncommon applications
- Unauthorized peer-to-peer apps (Ares P2P
application on non-standard ports) - Spam relays (compromised machines being used to
send spam) - Unauthorized web proxies (some hosts being used
as proxy to get Yahoo pages) - Identified several port-scanners
- Weird connection patterns 3 confirmed attacks,
others possibly attacks
19Conclusion
- There is value to understanding the
connection-level behavior of applications - Of interest to network operators
- Of interest to the measurement community
- Can characterize this behavior
- From light-weight information available at a
firewall - Minimal human intervention
20Results(2)