On the Validation of Traffic Classification Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

On the Validation of Traffic Classification Algorithms

Description:

Validation results (4) VoIP: MSN, Skype ... The deficiency of the classification of Skype. Simple extension of the algorithm ... – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 19
Provided by: gzaszabdni
Category:

less

Transcript and Presenter's Notes

Title: On the Validation of Traffic Classification Algorithms


1
On the Validation of Traffic Classification
Algorithms
  • Géza Szabó, Dániel Orincsay, Szabolcs Malomsoky,
    István Szabó
  • Traffic Lab, Ericsson Research Hungary

2
Aim Contents
  • Aim
  • Introduce our novel validation method which makes
    it possible to measure the accuracy of traffic
    classification methods
  • Contents
  • Requirements How should validation be done?
  • Related work How is it currently done?
  • Our proposal What have we proposed?
  • Working mechanism How does our proposal work?
  • Validation a state-of-the-art traffic
    classification method What have we learnt from
    the validation?
  • Future work What else can be done with the
    proposed method?

3
Requirements How should validation be done?
  • Objective of traffic classification
  • Identify applications in passively observed
    traffic
  • Validation of classification method by active
    test

4
Related work How is it currently done?
  • CURRENTLY
  • Weak and ad hoc validation
  • No reliable and widely accepted validation
    technique
  • No reference packet trace with well-defined
    content is available
  • Dynamically allocated ports
  • Non-realistic environment
  • Proprietary protocols
  • Encryption
  • Be up2date

S. Sen and J. Wang Analyzing Peer-to-peer
Traffic Across Large Networks
  • Header traces ? port based method
  • Lot of flows
  • Simultaneous applications
  • Previously well-classified traces

J. Erman, M. Arlitt and A. Mahanti Traffic
Classification Using Clustering Algorithms
  • Impossible to validate by others
  • Just hint
  • Impossible to repeat with same conditions

T. Karagiannis, K. Papagiannaki and M. Faloutsos
BLINC Multilevel Traffic Classification in the
Dark
L. Bernaille et al Traffic Classification On The
Fly
5
Our proposal
6
The proposed method for validation
  • Principle
  • Packets are collected into flows at the traffic
    generating terminal
  • Flows are marked with the identifier of the
    application that generated the packets of the
    flow
  • The main requirements on the realization of the
    method
  • It should not deteriorate the performance of the
    terminal
  • The byte overhead of marking should be negligible
  • The preferred realization is a driver that can be
    easily installed on terminals

The position of the proposed driver within the
terminal
7
Working mechanism
  • The packet is examined whether it is an incoming
    or outgoing packet
  • In case of an outgoing packet, the size of the
    packet is examined
  • Continues with only those packets which are
    smaller than the MTU decreased with the size of
    marking
  • The process continues with only TCP or UDP
    packets
  • According to the five-tuple identifier of the
    packet, it is checked whether there is already
    available information about which application the
    flow belongs to
  • Query operation system
  • Need marking
  • Randomly
  • Only first
  • Leave the first
  • No mark

The working mechanism of the introduced driver
8
Place of marking
  • Extending the original IP packet with one option
    field
  • Router Alert option field
  • Transparent for both the routers on the path and
    also for the receiver host (according to RFC 2113
    3).
  • The first two characters of the corresponding
    executable file name are added
  • Increasing the size of the packet with 4 bytes
  • The packet size field in the IP header is also
    increased with 4 bytes
  • Header checksum is recalculated

A marked packet of the BitTorrent protocol
9
Proof-of-concept
10
Reference measurement
  • Available at http//pics.etl.hu/szabog/measuremen
    t.tar
  • In a separated access network
  • Our driver has been installed onto all computers
    on this network
  • Duration of the measurement 43 hours
  • Captured data volume 6 Gbytes, containing 12
    million packets
  • The measurement contains the traffic of the most
    popular
  • P2P protocols
  • BitTorrent
  • eDonkey
  • Gnutella
  • DirectConnect
  • VoIP and chat applications
  • Skype
  • MSN Live
  • FTP sessions
  • Download manager
  • E-mail sending, receiving sessions
  • Web based e-mail (e.g., Gmail)
  • SSH sessions

The traffic mix of the measurement
11
Validation results (1) Success
  • Combined traffic classification method (described
    in 1) with the addition that the classification
    of VoIP applications has been extended with ideas
    from 2
  • Accurately identified
  • E-mail
  • Filetransfer
  • Streaming
  • Secure channel
  • Gaming traffic
  • Success due to
  • Well-documented protocols
  • Open standards
  • Do not constantly change
  • Difficulties in case of?
  • Encryption
  • But session initiation phase is critical as this
    phase can be identified accurately
  • Success SSH or SCP

The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification 2 M. Perenyi and S.
Molnar Enhanced Skype Traffic Identification
12
Validation results (2) P2P
  • Difficulties
  • Many TCP flows containing 1-2 SYN packets
    probably to disconnected peers
  • No payload in these packets gtthe signature based
    methods can not work
  • Dynamically allocated source ports towards not
    well-known destination ports gt the port based
    methods fail
  • Server search and P2P communication heuristic 1
    methods also fail gt there are no other
    successful flows to such IPs
  • Also some small non-P2P flows were misclassified
    into the P2P class
  • Not fully proper content of the port-application
    database
  • Creating too many port-application associations
    easily results in the rise of the
    misclassification ratio.
  • The constant change of P2P protocols
  • New features added to P2P clients day-by-day
  • Working mechanism can be typical for a selected
    client not the whole protocol itself

The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification 2 M. Perenyi and S.
Molnar Enhanced Skype Traffic Identification
13
Validation results (3) Philosophy
  • Traffic which is the derivation of other traffic
  • E.g., DNS traffic
  • MSN HTTP protocol for transmitting chat messages
  • MSN client transmits advertisements over HTTP,
    but this cannot be recognized as deliberate web
    browsing
  • Hit the classification outcome and the
    generating application type (the validation
    outcome) agreed
  • E.g., the chat on the DirectConnect hubs which
    has been classified as chat could have been
    considered as actually correct but in this
    comparison it was considered as misclassification

The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification
14
Validation results (4) VoIP MSN, Skype
  • High VoIP hit ratio is due to the successful
    identification
  • MSN Messenger
  • Skype
  • Skype is difficult to identify
  • Same problem as in the case of P2P
  • Proprietary protocol designed to ensure secure
    communication
  • 2 characteristic feature the application sends
    packets even when there is no ongoing call with
    an exact 20 sec interval.
  • In 1 a P2P identification heuristic which was
    designed to track any message which has a
    periodicity in packet sending
  • Extension of 1 was straightforward
  • The validation showed
  • The deficiency of the classification of Skype
  • Simple extension of the algorithm
  • Idea of 1 has been validated as it proved to be
    robust for the extension with new application
    recognition
  • Also the validation mechanism proved to be useful

The results of the classification compared 1 to
the reference measurement
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification 2 M. Perenyi and S.
Molnar Enhanced Skype Traffic Identification
15
Summary
  • We introduced a new active measurement method
    which can help in the validation of traffic
    classification methods.
  • The introduced method is a network driver
  • Mark the outgoing packets from the clients with
    an application specific marking
  • With the introduced method we created a
    measurement and used this to validate the method
    presented in 1
  • The method has been proved to be working
    accurately
  • Some deficiencies in the classification
  • P2P applications
  • Skype

Benefits
1 G. Szabo, I. Szabo and D. Orincsay Accurate
Traffic Classification
16
Further work
  • Use the marking method at the measurement side
    for online traffic classification
  • Assumptions
  • The terminals accessing an operators network are
    all installed with the proposed driver
  • The driver is made tamper-proof to avoid users
    forging the marking
  • Online clustering of the traffic into QoS classes
    based on the resource requirements of the
    generating application
  • Used by operators to charge on the basis of the
    used application by the user
  • Extension of the marking by other information
    about the traffic generating application
  • E.g., version number
  • Operator could track the security risks of an old
    application

17
Questions, discussion
  • Thank you very much for your kind attention!
  • Contact
  • E-mail geza.szabo_at_ericsson.com

18
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com