Title: Revealing Skype Traffic: When Randomness Plays with You
1Revealing Skype TrafficWhen Randomness Plays
with You
- D. Bonfiglio1, M. Mellia1, M. Meo1,D. Rossi2, P.
Tofanelli3Dipartimento di Elettronica,
Politecnico di Torino1 - ENST Télécom Paris2
- Motorola Inc.3
- ACM Sigcomm 2007
Presented by Te-Yuan Huang
2Outline
- Goal
- Contribution
- Know More about Skype
- Classifiers
- Experiments
- Conclusions
3Outline
- Goal
- Contribution
- Know More about Skype
- Classifiers
- Experiments
- Conclusions
4Goal
- Identify Skype Traffic among
- aggregated traffic
- Direct session
- Either UDP or TCP
- The algorithm should be
- Work in Real-Time
- Reliable
- Able to detect short flows (only last several
seconds)
5Outline
- Goal
- Contribution
- Know More about Skype
- Classifiers
- Experiments
- Conclusions
6Importance of Skype Traffic Identification
- Interest of network operator
- Network Design Provisioning
- Traffic and Performance Monitoring
- Tariff Policies
- Traffic Differentiation
7Difference from Related Work
- K.T. Chen et al.Quantifying Skype USI
- Only identify UDP traffic
- Need Skype login phase to be monitored
- Fail on backbone links
- Fail if any modification on Skype login proc.
- K. Suh et al.Characterizing and Detect relayed
traffic A case study using Skype - Only identify relayed Skype traffic
8Outline
- Goal
- Contribution
- Know More about Skype
- Classifiers
- Experiments
- Conclusions
9Lets get hands dirty Know more about Skype
traffic sources
A Skype Message
10Skype Parameters
- Rate
- Codec Rate
- Delta T
- Skype Message Framing Time
- The time between two subsequent Skype Message
- RF (Redundancy Factor)
- The number of past blocks that Skype retransmits
11Parameters changes on Network Conditions
12Skype Communication Mode
- End-to-End (E2E)
- Skype user call Skype user
- End-to-Out (E2O)
- Skype-in/Skype-out
- PSTN involved
- Only voice data
- No video / file transfer / IM
13Skype Codec
- Codecs
- Automatically selected
- ISAC
- The preferred codec for E2E
- G.729
- The preferred codec for E2O
14More on Skype Message
- Skype encrypt the message
- TCP
- Reliable transport
- Receive packet in correct sequence(from
application layer point of view) - encrypt the whole content of the message
- UDP
- Unreliable
- Maybe out-of-order
- Application layer header is needed
- to resolve incorrect order
- Only can be obfuscated
- Only encrypt partial message
15TCP E2E Message
1
2
3
Byte
Frame
16UDP E2E Message
1
2
3
4
Byte
ID
Frame
Fun
- Identified Field
- ID 16-bit long identifier.
- Randomly selected
- Fun 5-bit long field masked by 0x8f
- Used to stating the payload type
- 0x02, 0x03, 0x07,0x0f signaling message
- 0x0d Data message (all 4 types DATA)
- Not Random, but obfuscate (Mixed)
- Frame ciphered information
17E2O Message
1
2
3
4
Byte
CID
Frame
- Identified Field
- CCID 4 bytes
- Connection Identifier (CID) of PSTN gateway
- Deterministic
- After initial signaling
18Outline
- Goal
- Contribution
- Know More about Skype
- Classifiers
- Experiments
- Conclusions
19How to Identify Skype Traffic?
- Chi-Square Classifier (CSC)
- Utilize the knowledge of ciphering mechanism
- Naïve Bayes Classifier (NBC)
- Utilize the general characteristics of VoIP
traffics - Payload-Based Classifier (PBC)
- Look into the non-ciphered SoM
- Only used for traffic in UDP
20Chi-Square Classifier (CSC)
- Purpose
- To Know whether message portion is encrypted
- Rationale
- Given a message,
- Only the third bytes is not random
- Probably, E2E Skype flow by UDP
- The first four bytes are deterministic, others
are ciphered - Probably, E2O Skype flow by UDP
- The whole message is ciphered
- Probably, Skype flow transported by TCP
21Chi-Square Classifier (CSC) Cont.
- Chi-Square Distr.
- Observing the objects ouput for nTOT times
- There are n possible output
- For ith output, it is expected to occur Ei times
among nTOT, and is observed to occur Oi times - Then, is Chi-Square Distr. With n-1 degree
of freedom
22Chi-Square Classifier (CSC) Cont.
- For each flow, take first G group of b bits
- For each group g, there are 2b possible output
- If the content of the flow is random, then Ei for
each group is nTOT / 2b
b bits
b bits
b bits
..
b bits
..
1
2
3
G
23Chi-Square Classifier (CSC) Cont.
- Evaluate the test statistic as
- Define the thresholds by
24Chi-Square Classifier (CSC) Cont.
- G 16, b 4bits are used
- E2E over UDP
- The block g 5 or 6 is mixed
- Others are random
- Classified Criteria
-
25Chi-Square Classifier (CSC) Cont.
- E2O over UDP
- E2E or E2O over TCP
- Not Skype
- Otherwise
26Chi-Square Classifier (CSC) Cont.
- Deterministic test satistics
- Linear with nTOT
27Chi-Square Classifier (CSC) Cont.
- Mixed block
- If one bit is fixed and the others are random
- Linearly increase with nTOT
28Chi-Square Classifier (CSC) Cont.
29Chi-Square Classifier (CSC) Cont.
- Chi-Square works only if the observation is large
enough, that is - Ei nTOT/2b gt5
- Namely, nTOT gt 80
- Choose nTOT 100
- Also, set
30Naïve Bayes Classifier
- Feature vector x xi
- PCx the probability that the object is
belong to class C, given the feature x is
observed - PxC the probability that the feature x will
be observed, given the object is belong to class
C - Bayes Rule
- PCx PxCPC / Px
31Naïve Bayes Classifier cont.
- Naïve features are independent
- PxC called belief
32NBC Feature Selection
- VoIP
- Small Message Size
- Less burstier than data traffic
- Feature
- Message size
- Observe a window of message at a timex s1,
s2, , sw - Average-Inter Packet Gap (average-IPG)
33NBC Feature Selection
- Belief
- How to determine
- PsiC
34NBC Feature Characterization
- For each codec, the message size is determined by
- Rate
- Header length
- Redundancy factor (RF)
- Message framing time (delta T)
- The message size can be represented by Gaussian
distribution
35NBC Feature Characterization
- Map each codec to a Gaussian distr.
- Model average-IPG to a Gaussian distr. with
For Constant Bit Rate Codec
For variable Bit Rate Codec
36NBC Derive Beliefs
37NBC Make Decision
- Let
- Define a threshold Bmin
- If B gt Bmin
- Valid Skype flow
- Otherwise
- Not Skype flow
38Payload Based Classifier (PBC)
- Used as cross check for previous two classifier
- Only useful for UDP traffic
- Two Part
- Per-flow Identification
- Per-host Identification
39PBC - Per-flow Identification
- Utilize the knowledge about UDP E2E Message
- Fun 5-bit long field masked by 0x8f
- Used to stating the payload type
- 0x02, 0x03, 0x07,0x0f signaling message
- 0x0d Data message (all 4 types DATA)
1
2
3
4
Byte
ID
Frame
Fun
40PBC - Per-flow Identification
- Terminology
- nTOT the total number of packets in the flow
- nsig the number of Skype signaling message
- nE2E the number of Skype E2E data/video/chat/voic
e message - nE2O the number of Skype E2O voice message
41PBC - Per-flow Identification
42PBC - Per-host Identification
- Known a Skype client always uses the same UDP
port to send/receive traffic - Before start conversation,
- Signaling messages are sent between two clients
- Able to identify a Skype client running at a
specific IP and port
43PBC - Per-host Identification
- Criteria to identify the Skype client IP/port
44Experiment
- Two Data Set
- Campus 95 hours took on 2006/5/29
- No P2P traffic is allowed
- Most traffic are TCP data flows
- ISP one day took on 2006/5/15
- All traffic is allowed
- More heterogeneous
- Expect little Skype traffic
45Measurement Result
46Measurement Result UDP, Campus
47Measurement Result UDP, ISP
48Measurement Result - TCP
49Parameter Tuning - Bmin
50Parameter Tuning X2(Thr)
51Parameter Tuning Bmin X2(Thr)
52Parameter Tuning Bmin X2(Thr)
53Conclusion
- Reveal Skype Traffic from aggregate streams of
packets - Two Approach
- Statistical properties of randomness
- Stochastic characteristics of voice traffic
- Negligible False Positives
- Few False Negative left out