On the Validation of Traffic Classification Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

On the Validation of Traffic Classification Algorithms

Description:

Validation results (4) VoIP: MSN, Skype ... The deficiency of the classification of Skype. Simple extension of the algorithm ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 19

Provided by: gzaszabdni

Learn more at: http://pam2008.cs.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: On the Validation of Traffic Classification Algorithms

1
On the Validation of Traffic Classification
Algorithms

Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,
István Szabó
Traffic Lab, Ericsson Research Hungary

2
Aim Contents

Aim
Introduce our novel validation method which makes
it possible to measure the accuracy of traffic
classification methods
Contents
Requirements How should validation be done?
Related work How is it currently done?
Our proposal What have we proposed?
Working mechanism How does our proposal work?
Validation a state-of-the-art traffic
classification method What have we learnt from
the validation?
Future work What else can be done with the
proposed method?

3
Requirements How should validation be done?

Objective of traffic classification
Identify applications in passively observed
traffic
Validation of classification method by active
test

4
Related work How is it currently done?

CURRENTLY
Weak and ad hoc validation
No reliable and widely accepted validation
technique
No reference packet trace with well-defined
content is available

Dynamically allocated ports

Non-realistic environment

Proprietary protocols
Encryption
Be up2date

S. Sen and J. Wang Analyzing Peer-to-peer
Traffic Across Large Networks

Header traces ? port based method

Lot of flows
Simultaneous applications

Previously well-classified traces

J. Erman, M. Arlitt and A. Mahanti Traffic
Classification Using Clustering Algorithms

Impossible to validate by others

Just hint

Impossible to repeat with same conditions

T. Karagiannis, K. Papagiannaki and M. Faloutsos
BLINC Multilevel Traffic Classification in the
Dark
L. Bernaille et al Traffic Classification On The
Fly
5
Our proposal
6
The proposed method for validation

Principle
Packets are collected into flows at the traffic
generating terminal
Flows are marked with the identifier of the
application that generated the packets of the
flow
The main requirements on the realization of the
method
It should not deteriorate the performance of the
terminal
The byte overhead of marking should be negligible
The preferred realization is a driver that can be
easily installed on terminals

The position of the proposed driver within the
terminal
7
Working mechanism

The packet is examined whether it is an incoming
or outgoing packet
In case of an outgoing packet, the size of the
packet is examined
Continues with only those packets which are
smaller than the MTU decreased with the size of
marking
The process continues with only TCP or UDP
packets
According to the five-tuple identifier of the
packet, it is checked whether there is already
available information about which application the
flow belongs to
Query operation system
Need marking
Randomly
Only first
Leave the first
No mark

The working mechanism of the introduced driver
8
Place of marking

Extending the original IP packet with one option
field
Router Alert option field
Transparent for both the routers on the path and
also for the receiver host (according to RFC 2113
3).
The first two characters of the corresponding
executable file name are added
Increasing the size of the packet with 4 bytes
The packet size field in the IP header is also
increased with 4 bytes
Header checksum is recalculated

A marked packet of the BitTorrent protocol
9
Proof-of-concept
10
Reference measurement

Available at http//pics.etl.hu/szabog/measuremen
t.tar
In a separated access network
Our driver has been installed onto all computers
on this network
Duration of the measurement 43 hours
Captured data volume 6 Gbytes, containing 12
million packets
The measurement contains the traffic of the most
popular
P2P protocols
BitTorrent
eDonkey
Gnutella
DirectConnect
VoIP and chat applications
Skype
MSN Live
FTP sessions
Download manager
E-mail sending, receiving sessions
Web based e-mail (e.g., Gmail)
SSH sessions

The traffic mix of the measurement
11
Validation results (1) Success

Combined traffic classification method (described
in 1) with the addition that the classification
of VoIP applications has been extended with ideas
from 2
Accurately identified
E-mail
Filetransfer
Streaming
Secure channel
Gaming traffic
Success due to
Well-documented protocols
Open standards
Do not constantly change
Difficulties in case of?
Encryption
But session initiation phase is critical as this
phase can be identified accurately
Success SSH or SCP

The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification 2 M. Perenyi and S.
Molnar Enhanced Skype Traffic Identification
12
Validation results (2) P2P

Difficulties
Many TCP flows containing 1-2 SYN packets
probably to disconnected peers
No payload in these packets gtthe signature based
methods can not work
Dynamically allocated source ports towards not
well-known destination ports gt the port based
methods fail
Server search and P2P communication heuristic 1
methods also fail gt there are no other
successful flows to such IPs
Also some small non-P2P flows were misclassified
into the P2P class
Not fully proper content of the port-application
database
Creating too many port-application associations
easily results in the rise of the
misclassification ratio.
The constant change of P2P protocols
New features added to P2P clients day-by-day
Working mechanism can be typical for a selected
client not the whole protocol itself

Traffic which is the derivation of other traffic
E.g., DNS traffic
MSN HTTP protocol for transmitting chat messages
MSN client transmits advertisements over HTTP,
but this cannot be recognized as deliberate web
browsing
Hit the classification outcome and the
generating application type (the validation
outcome) agreed
E.g., the chat on the DirectConnect hubs which
has been classified as chat could have been
considered as actually correct but in this
comparison it was considered as misclassification

The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification
14
Validation results (4) VoIP MSN, Skype

High VoIP hit ratio is due to the successful
identification
MSN Messenger
Skype
Skype is difficult to identify
Same problem as in the case of P2P
Proprietary protocol designed to ensure secure
communication
2 characteristic feature the application sends
packets even when there is no ongoing call with
an exact 20 sec interval.
In 1 a P2P identification heuristic which was
designed to track any message which has a
periodicity in packet sending
Extension of 1 was straightforward
The validation showed
The deficiency of the classification of Skype
Simple extension of the algorithm
Idea of 1 has been validated as it proved to be
robust for the extension with new application
recognition
Also the validation mechanism proved to be useful

We introduced a new active measurement method
which can help in the validation of traffic
classification methods.
The introduced method is a network driver
Mark the outgoing packets from the clients with
an application specific marking
With the introduced method we created a
measurement and used this to validate the method
presented in 1
The method has been proved to be working
accurately
Some deficiencies in the classification
P2P applications
Skype

Benefits
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification
16
Further work

Use the marking method at the measurement side
for online traffic classification
Assumptions
The terminals accessing an operators network are
all installed with the proposed driver
The driver is made tamper-proof to avoid users
forging the marking
Online clustering of the traffic into QoS classes
based on the resource requirements of the
generating application
Used by operators to charge on the basis of the
used application by the user
Extension of the marking by other information
about the traffic generating application
E.g., version number
Operator could track the security risks of an old
application