Title: Networking Research UofC
1Networking Research (UofC)
- Carey Williamson
- iCORE Chair and NSERC/iCORE/TELUS Mobility
Industrial Research Chair - Department of Computer Science
- University of Calgary
2 Research Team
- Faculty Majid Ghaderi, Zongpeng Li, Mea Wang,
Carey Williamson - Research Staff Martin Arlitt, Jingxiang Luo,
Terence Robinson, Hongxia Sun, Qian Wu - Students Jean Cao, Marian Doerk, Phillipa Gill,
Mingwei Gong, Ajay Gopinathan, Emir Halepovic,
Andreas Hirt, Rohit Joshi, Ahmed Obied, Nadim
Parvez, Partha Ramanujam, Tuan Vu, ...
3 Research Overview
- Research area?
- Wireless networks, Internet protocols, computer
systems performance evaluation - Mission Make the Internet go faster
- Approach?
- Experimental, simulation, analytical
- Key challenges?
- Citius, Altius, Fortius!
- Performance, scalability, robustness
4Experimental Facilities
- Wireless Internet Performance Lab (UofC)
- IEEE 802.11b wireless LAN
- SnifferPro, Airopeek wireless network analyzers
- PCs, laptops, PDAs, wireless NICs, Web proxy
- Experimental Laboratory for Internet Systems and
Applications (UofC/UofS,CFI) - Geographically distributed Internet testbed
between Calgary and Saskatoon - Clients, servers, notebooks, routers, switches,
Web proxies, network analyzers, 802.11a/b - Fully operational since Spring 2004
5 Research Highlights
- Network Traffic Measurements
- Martin Arlitt, et al.
- Internet Traffic Classification
- Jeff Erman, Anirban Mahanti, et al.
- Wireless LAN Traffic Measurements
- Aniket Mahanti, Martin Arlitt, et al.
- Cellular Network Capacity Planning
- Yujing Wu, Jingxiang Luo, Hongxia Sun
6Network Traffic Measurement
- Collect and analyze packet-level traces from a
live network, using special equipment - Process traces, statistical analysis
- Diagnose performance problems (network,
protocol, application)
101101
7Network Traffic Measurement
- Continuous monitoring of U of C traffic on
commercial Internet link (100 Mbps), recording
TCP SYN/FIN/RST pkt headers - 36 months of data and counting
- Specific measurement studies to date
- TCP reset behaviour (Arlitt)
- P2P traffic evolution (Madhukar)
- Internet traffic classification (Erman)
- Malicious network attacks (Obied)
8TCP and HTTP Results
9Semi-Supervised Network Traffic
Classification Jeffrey Erman, Anirban
Mahanti, Martin Arlitt?, Ira Cohen?,
Carey Williamson Department of Computer
Science, University of Calgary Department of
Computer Science and Engineering, Indian
Institute of Technology (Delhi) ?Enterprise
Systems Software Labs, HP Labs
Introduction
Semi-Supervised Results
Retraining Detection
Identifying and categorizing network traffic by
application type is challenging because of the
continued evolution of applications, especially
of those with a desire to be undetectable. The
diminished effectiveness of port-based
identification and the overheads of deep packet
inspection approaches motivate us to propose a
traffic classification methodology that relies on
using only flow statistics to classify traffic.
Although we found that our classifiers remained
robust for extended periods of time, a mechanism
for determining when the classifier needs
updating is still required.
Labelling of training feature vectors is one of
the most time consuming steps of the
classification process.
Figure 2 Training with (Un)labelled Flows
The results in Figure 2 show the effect on the
classifiers precision when we used a fixed
number of labelled flows and a varying numbers of
unlabelled flows in the training data set. Our
results show that for a fixed number of labelled
training flows, increasing the number of
unlabelled flows increases the classifiers
precision.
Figure 5 Correlation Between Average Distance
and Flow Accuracy
Figure 1 Selective Labelling of Flows
We propose using the average distance of new
flows to the centroid of the nearest cluster a
significant increase in the average distance
indicates the need for an update.
In Figure 1 we test the hypothesis that if a few
flows are labelled in each cluster then we have a
reasonable basis for creating the cluster to
application mapping. With as few as two labels
per cluster, we attain 94 flow accuracy.
Our proposed technique is a flexible mathematical
framework that leverages both labeled and
unlabeled flows. This semi-supervised approach to
learning a network traffic classifier is a key
contribution of this work.
Conclusions
- Fast and accurate classifiers can be obtained
by training with a small number of labelled flows
mixed with a large number of unlabelled flows. - High flow and byte accuracy can be achieved for
offline and real-time classification - Robust classifiers can be built that are immune
to transient changes in network conditions. - Our approach can be integrated with solutions
that collect flow statistics. We developed a
prototype real-time classifier using Bro 4.
Real-Time Classification
Classification Framework
- A fundamental challenge in the design of the
real-time classification system is the need to
classify a flow as soon as possible. Unlike
offline classification where all discriminating
flow statistics are available a priori, in the
real-time context we only have partial
information on the flow statistics. - Our solution uses a layered classification system
based on the idea of packet milestones. - A packet milestone is reached when the count of
the total number of packets a flow has sent or
received reaches a specific value. - Each layer has an independent classifier.
- Flow statistics are monitored in real-time.
- As a flow reaches a packet milestone it is
classified/reclassified by the appropriate layer. - This layered approach allows us to revise and
potentially improve the classification of flows. - Figures 3 4 present example results by using
the April 13, 9 am trace we collected from the
UofC. We see that the classier performs well,
with byte accuracies typically in the 70 to 90
range.
Unlabelled Training Data
Labelled Clusters
Labelled Training Data
Classified Flows
Unclassified Flows
References
Figure 3 Performance of Real-time Classifier
Step 1 Model Building
Step 2 Classification
1 O. Chapelle, B. Scholkopf, and A. Zien,
editors. Semi-Supervised Learning. MIT Press,
Cambridge, MA, 2006. 2 J. Erman, A. Mahanti,
M. Arlitt, I. Cohen, and C. Williamson.
Offline/Online Traffic Classification Using
Semi-Supervised Learning. To Appear in Proc. of
IFIP Performance 2007 3 J. Erman, A. Mahanti,
M. Arlitt, and C. Williamson. Identifying and
Discriminating Between Web and Peer-to-Peer
Traffic in the Network Core. In WWW07, Banff,
Canada, May 2007. 4 V. Paxson. Bro A System
for Detecting Nework Intruders in Real-time.
Computer Networks, 31(23-24)2435-2463, 1999.
- Classifier assigns each new unclassified flow to
the nearest cluster using Euclidean distance.
This is the maximum likelihood cluster
assignment. - Label of the assigned cluster becomes the
classification of the flow. - A cluster label is obtained using the labelled
flows available in each cluster. - These can be obtained through a variety of
means (automated) payload analysis, port
numbers, expert knowledge. - Clusters with no labels can be left as unknown.
- A clustering algorithm partitions the training
flows into disjoint groups called clusters based
on similarity. The advantages are - Builds natural clusters.
- The number of training flows needed is small
(e.g., 8000)
Typical byte accuracies in the 70 to 90 range.
Acknowledgements
This work was supported by the Natural Sciences
and Engineering Research Council (NSERC) of
Canada and Informatics Circle of Research
Excellence (iCORE) of the province of Alberta,
Canada.
Training Data Training data can be a mix of
labelled and unlabelled flows. Features include
Average Packet Size, Number of Packets, Payload
Bytes, Header Bytes, etc.
Figure 4 Byte Accuracy of Real-time Classifier
Full Paper available at http//pages.cpsc.ucalgar
y.ca/erman/
10Wireless-side Trace Collection
- RFGrabbers were configured to scan channels 1, 6,
and 11 to capture AirUC WLAN traffic in the b/g
mode. - Over 6 weeks, RFGrabbers captured packets from 97
APs at 9 locations, representing 20 of the UofC
WLAN.
11CDMA2000 EV-DO Downlink
flow arrivals
schedule queue i at slot t
maximum feasible rate of queue j at slot t
Propagation loss, shadowing, fast fading
realized throughput of queue j up to slot t
Index of the scheduled queue at slot t
12Future Plans
- More of the same!
- P2P systems modeling and analysis
- Wireless Internet measurement/modeling
- WiMax (IEEE 802.16)
- QoS in CDMA2000 EV-DO
- Wireless mesh networks?
- Sensor networks?
- Grid computing?
- Network security?