Title: State-of-the-Art%20of%20Internet%20Traffic%20Measurement%20and%20Analysis
1State-of-the-Art ofInternet Traffic Measurement
and Analysis
- The 31st APEC TEL WG Meeting
- April 5th, 2005
- Bangkok, Thailand
- Sue B. Moon
- Division of Computer Science
- KAIST
- South Korea
2Overview
- Brief Historical Overview
- Evolution of Measurement Techniques
- Status Quo of Measurement Techniques
- Future Work
3Brief Historical Overview
- 1970s and 1980s
- Performance was not an issue
- Very few papers about performance
- ping and traceroute were only tools
- 1990s
- Internet exploded
- Lack of measurement/analysis/visualization tools
sorely felt - Measurement became important in research
- 2000s
- Competition between ISPs became intense
- Service Level Agreements (SLAs) became critical
- Security became sales point
4Evolution of Measurement Techniques
- Internet Design Philosophy
- Basic ping and traceroute
- Vern Paxsons Work
- My Personal Perspective
5Internet Design Philosophy
- Packet switching
- Continued communcation around failures
- Support for diverse services and protocols
- Distributed management of resources
- No access control
- Simplicity at the core, complexity at the edge
6The Internet Hourglass (Deering_at_IETF)
7What is the Internet today?
BBN
Tier 2 ISP
UUnet
BT
Sprint (AS)
Dial-up ISP
Peering point
8Internet Users Different from PSTN Users
- ISPs
- Too much diverse traffic to monitor
- Hard to get a complete picture
- Routers barely keep up with core tasks
- End-users
- More options than traditional telco customers
9ping
- ICMP-based tool for host reachability
- Algorithm
- Sends an ICMP echo request with
- Identifier for unique ping process
- Sequence number per echo request
- Receiving host returns an ICMP echo reply
- Prints out RTT, TTL, and seq. .
- Issues
- Many routers filter out ICMP packets
- It goes thru slow path on routers
- RTT includes end system processing time
10traceroute
- Used to find out the forward path to a host
- Algorithm
- Send an IP datagram with TTL1
- First router sends back ICMP time exceeded
- Then send a datagram with TTL2
- Continue till destination is reached/TTL expired
- Issues
- not suited for performance measurements
11Vern Paxsons PhD Thesis
- Many findings about Internet Performance
- Delay
- Loss
- Unexpected routing behaviors
- route changes, flaps,
- Clock synchronization
- Incomplete logging
12Paxsons Tools
- Instrumented pings
- Send packets between a set of nodes
- In todays Internet
- Active measurements for performance monitoring
- Passive measurements for control-domain monitoring
13Passive Measurement
- No traffic injected for measurement purpose
- Not invasive
- Only data collection increases traffic
- Access limited
- Measurement about total traffic
- Privacy/Security - serious concern
14Passive Measurement Examples
- Packet monitors
- Tcpdump for Unix-based hosts
- Dedicated measurement systems
- DAGMON (up to 10GE)
- Router/switch traffic statistics
- Network internal behavior
- SNMP MIBs
- Flow-level information
- Ciscos NetFlow, Junipers Accounting, Arbors
PickFlow
15Packet-Level Measurements
- Pros
- very fine granularity
- Challenges
- link speeds are increasing!
- Large volumes of data
- system design issues
- disk/PCI bus speeds
- installation cost
16Challenges in Data Collection
- On 1GE link
- of flows per sec 100K 1 mil
- 1KB per flow gt 1GB per sec
- On 10GE link
- of packets per sec 10 mil 200 mil
- 2GHz processor gt 10 cycles
- You need 10 GE link to monitor 10 GE link!
17Why Sampling/Filtering?
- Problems with large volumes of data
- feasibility of collection at high-speeds
- memory/bus/processor requirements
- storage limitations
- complexity of analysis
18State-of-the-Art
- Cisco
- sampled netflow
- capture 1 in N
- aggregate by five-tuple
- Juniper
- filter on any combination of header fields
- sample 1 in N
- recommends 1 in 1000 or less
- How much data do you collect when N 1000?
19Personal Experience at Sprint
- When I first arrived, I heard
- No loss on Sprint backbone network
- Almost no delay
- Cadillac brand of IP service
20Min/Avg/Max Single-Hop Delay per Minute
21Single-Hop Delay w/o Cisco Router Idiosyncracies
22Multi-Hop Delay Distributions
23Three Paths Connectivity
Fiber prop.delay
28ms
32ms
34ms
24Identification of Constant Factors Multi-Paths
- Equal Cost Multi Paths (ECMP)
- Src/Dst addresses, Router ID
25Peaks in Variable Delay
26Closer Look
27Issues in "Good" Routing
- Misbehaving routing protocols
- BGP misconfigurations
- Pathological behaviors
- Frequent changes
- Even under normal circumstances
- Transient behaviors
- Inter/intra-domain routing not well understood
28Routing Across Internet
- Protocols
- Interior Gateway Protocols (IS-IS, OSPF, RIP)
- Exterior Gateway Protocols (BGP)
- How they work
- IGP find best (shortest) path across a
domain - BGP announce reachability between domains
- policy determines inter-domain paths
29Routing Research Projects
- Routeviews
- 50 peering at route-views.oregon-ix.net
- MRT format RIBs and BGP updates, show ip bgp
dumps, route dampening data - only E-BGP
- RIPE (Réseaux IP Européens)
- routing updates from 9 mostly European IXs
- Looking Glass services for BGP
- Routing information service (RIS)
30Scenario for a Transient Routing Loop In Normal
Operation
31When a link fails, R1 is the first to detect.
32R3 is updated before R2.
33Finally R2 is updated, and the loop is resolved.
34CDF of Routing Loop Duration in Time
35VoIP experimental setup Boutremans2002
- Traffic injected in the network
- 200 byte UDP packets
- every 5ms.
- Packets captured and timestamped at end-systems.
- Traceroute runs continuously during the
experiment. - Induced link failures on purpose to evalute
convergence time and impact on e2e connections
36Information Sources
- IS-IS BGP listener logs
- Router logs from both ends of failing links
- Controlled bi-directional VoIP traffic between
Reston and ATL - SNMP data
37Delays (1 sec timescale)
38When the two interfaces went down
6.6 seconds
39When three links came back up
40Approaches To Fix It
- Fine-tuning parameters
- Timer values Alattinoglu2002
- Modify Routing Protocols
- Suppress advertisement and perform local
rerouting using a backwarding table Lee04 - Centralized path computation Feamster04,Rexford04
- Exploit multi-path
- Our approach to provide Value-Added Service
41What I have learned
- No loss, almost no delay
- Almost. I gained insight into causes behind
- Debunking the myths Odlyzko2005
- Streaming real-time traffic
- QoS
- Content is king
- Usage-sensitive pricing
42Other Issues Tackled
- Traffic Matrix Estimation
- Inspired by tomography in other fields
- Before arrival of efficient NetFlow
- Network Anomaly Detection
- NIDS, IDS gt PCA-based global monitoring
- Optimization
- Cross-layer resource allocation
43Taxonomy of Traffic Matrices
- Point-to-Point
- demand btwn ingress and egress point
- Ingress/Egress POP, link, router, BGP prefix
44Scalability
- Example 20 POPs, 500 routers, 3K links
- Granularity/size tradeoff
- POP-to-POP O(100)
- router-to-router O (104)
- prefix-to-prefix O (1010)
- Challenge - Collecting, storing and manipulating
large TMs!
45Usage Based Charging
- Feasible?
- Where to measure?
- At last hop
- Scenario
- A I want to download Bs webpage
- B That page is 1MB large
- A OK
- Between ISPs
- What do you do with retx, ack, delay?
46Future Work
- In Measurement Technology
- Keep up with increased link speed 40GE
- Improve sampling techniques
- Infer what we cannot measure
- Pinpoint security holes
- Personal perspective
- More into creating value-added services
- MPLS/VPN performance issues
- Sound Measurement Infrastructure
47Acknowledgements
- Thank D. Papagiannaki, B.-Y. Choi, U. Hengartner,
C. Boutresmans, and G. Iannaccone for help with
the slides.
48BACKUP SLIDES