Title: On the Validation of Traffic Classification Algorithms
1On the Validation of Traffic Classification
Algorithms
- Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,
István Szabó - Traffic Lab, Ericsson Research Hungary
2Aim Contents
- Aim
- Introduce our novel validation method which makes
it possible to measure the accuracy of traffic
classification methods - Contents
- Requirements How should validation be done?
- Related work How is it currently done?
- Our proposal What have we proposed?
- Working mechanism How does our proposal work?
- Validation a state-of-the-art traffic
classification method What have we learnt from
the validation? - Future work What else can be done with the
proposed method?
3Requirements How should validation be done?
- Objective of traffic classification
- Identify applications in passively observed
traffic - Validation of classification method by active
test
4Related work How is it currently done?
- CURRENTLY
- Weak and ad hoc validation
- No reliable and widely accepted validation
technique - No reference packet trace with well-defined
content is available
- Dynamically allocated ports
- Non-realistic environment
- Proprietary protocols
- Encryption
- Be up2date
S. Sen and J. Wang Analyzing Peer-to-peer
Traffic Across Large Networks
- Header traces ? port based method
- Lot of flows
- Simultaneous applications
- Previously well-classified traces
J. Erman, M. Arlitt and A. Mahanti Traffic
Classification Using Clustering Algorithms
- Impossible to validate by others
- Impossible to repeat with same conditions
T. Karagiannis, K. Papagiannaki and M. Faloutsos
BLINC Multilevel Traffic Classification in the
Dark
L. Bernaille et al Traffic Classification On The
Fly
5Our proposal
6The proposed method for validation
- Principle
- Packets are collected into flows at the traffic
generating terminal - Flows are marked with the identifier of the
application that generated the packets of the
flow - The main requirements on the realization of the
method - It should not deteriorate the performance of the
terminal - The byte overhead of marking should be negligible
- The preferred realization is a driver that can be
easily installed on terminals
The position of the proposed driver within the
terminal
7Working mechanism
- The packet is examined whether it is an incoming
or outgoing packet - In case of an outgoing packet, the size of the
packet is examined - Continues with only those packets which are
smaller than the MTU decreased with the size of
marking - The process continues with only TCP or UDP
packets - According to the five-tuple identifier of the
packet, it is checked whether there is already
available information about which application the
flow belongs to - Query operation system
- Need marking
- Randomly
- Only first
- Leave the first
- No mark
The working mechanism of the introduced driver
8Place of marking
- Extending the original IP packet with one option
field - Router Alert option field
- Transparent for both the routers on the path and
also for the receiver host (according to RFC 2113
3). - The first two characters of the corresponding
executable file name are added - Increasing the size of the packet with 4 bytes
- The packet size field in the IP header is also
increased with 4 bytes - Header checksum is recalculated
A marked packet of the BitTorrent protocol
9Proof-of-concept
10Reference measurement
- Available at http//pics.etl.hu/szabog/measuremen
t.tar - In a separated access network
- Our driver has been installed onto all computers
on this network - Duration of the measurement 43 hours
- Captured data volume 6 Gbytes, containing 12
million packets - The measurement contains the traffic of the most
popular - P2P protocols
- BitTorrent
- eDonkey
- Gnutella
- DirectConnect
- VoIP and chat applications
- Skype
- MSN Live
- FTP sessions
- Download manager
- E-mail sending, receiving sessions
- Web based e-mail (e.g., Gmail)
- SSH sessions
The traffic mix of the measurement
11Validation results (1) Success
- Combined traffic classification method (described
in 1) with the addition that the classification
of VoIP applications has been extended with ideas
from 2 - Accurately identified
- E-mail
- Filetransfer
- Streaming
- Secure channel
- Gaming traffic
- Success due to
- Well-documented protocols
- Open standards
- Do not constantly change
- Difficulties in case of?
- Encryption
- But session initiation phase is critical as this
phase can be identified accurately - Success SSH or SCP
The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification 2 M. Perenyi and S.
Molnar Enhanced Skype Traffic Identification
12Validation results (2) P2P
- Difficulties
- Many TCP flows containing 1-2 SYN packets
probably to disconnected peers - No payload in these packets gtthe signature based
methods can not work - Dynamically allocated source ports towards not
well-known destination ports gt the port based
methods fail - Server search and P2P communication heuristic 1
methods also fail gt there are no other
successful flows to such IPs - Also some small non-P2P flows were misclassified
into the P2P class - Not fully proper content of the port-application
database - Creating too many port-application associations
easily results in the rise of the
misclassification ratio. - The constant change of P2P protocols
- New features added to P2P clients day-by-day
- Working mechanism can be typical for a selected
client not the whole protocol itself
The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification 2 M. Perenyi and S.
Molnar Enhanced Skype Traffic Identification
13Validation results (3) Philosophy
- Traffic which is the derivation of other traffic
- E.g., DNS traffic
- MSN HTTP protocol for transmitting chat messages
- MSN client transmits advertisements over HTTP,
but this cannot be recognized as deliberate web
browsing - Hit the classification outcome and the
generating application type (the validation
outcome) agreed - E.g., the chat on the DirectConnect hubs which
has been classified as chat could have been
considered as actually correct but in this
comparison it was considered as misclassification
The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification
14Validation results (4) VoIP MSN, Skype
- High VoIP hit ratio is due to the successful
identification - MSN Messenger
- Skype
- Skype is difficult to identify
- Same problem as in the case of P2P
- Proprietary protocol designed to ensure secure
communication - 2 characteristic feature the application sends
packets even when there is no ongoing call with
an exact 20 sec interval. - In 1 a P2P identification heuristic which was
designed to track any message which has a
periodicity in packet sending - Extension of 1 was straightforward
- The validation showed
- The deficiency of the classification of Skype
- Simple extension of the algorithm
- Idea of 1 has been validated as it proved to be
robust for the extension with new application
recognition - Also the validation mechanism proved to be useful
The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification 2 M. Perenyi and S.
Molnar Enhanced Skype Traffic Identification
15Summary
- We introduced a new active measurement method
which can help in the validation of traffic
classification methods. - The introduced method is a network driver
- Mark the outgoing packets from the clients with
an application specific marking - With the introduced method we created a
measurement and used this to validate the method
presented in 1 - The method has been proved to be working
accurately - Some deficiencies in the classification
- P2P applications
- Skype
Benefits
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification
16Further work
- Use the marking method at the measurement side
for online traffic classification - Assumptions
- The terminals accessing an operators network are
all installed with the proposed driver - The driver is made tamper-proof to avoid users
forging the marking - Online clustering of the traffic into QoS classes
based on the resource requirements of the
generating application - Used by operators to charge on the basis of the
used application by the user - Extension of the marking by other information
about the traffic generating application - E.g., version number
- Operator could track the security risks of an old
application
17Questions, discussion
- Thank you very much for your kind attention!
- Contact
- E-mail geza.szabo_at_ericsson.com
18(No Transcript)