Title: Internet Measurement
1Internet Measurement
2Outline
- Measurement overview
- Why measure? Why model measurements?
- What to measure? Where to measure?
- Internet challenges
- Measurement tools
- Active ping, traceroute, and pathchar
- Passive logs, SNMP, packet, and flow monitoring
- Operational applications of measurement
- Discussion
3Why Measure?
- The Internet is a man-made system, so why do we
need to measure it? - Because we still dont really understand it
- Because sometimes things go wrong
- Measurement for network operations
- Detecting and diagnosing problems
- What-if analysis of future changes
- Measurement for scientific discovery
- Characterizing a complex system as organism
- Creating accurate models that represent reality
- Identifying new features and phenomena
4Why Build Models of Measurements?
- Compact summary of measurements
- Efficient way to represent a large data set
- E.g., exponential distribution with mean 100 sec
- Expose important properties of measurements
- Reveals underlying cause or engineering question
- E.g., mean RTT to help explain TCP throughout
- Generate random but realistic data as input
- Generate new data that agree in key properties
- E.g., topology models to feed into simulators
All models are wrong, but some models are
useful. George Box
5What Can be Measured?
- Traffic
- Load statistics
- Packet or flow traces
- Performance of paths
- Application performance, e.g,. Web download time
- Transport performance, e.g., TCP bulk throughput
- Network performance, e.g., packet delay and loss
- Network structure
- Topology, and paths on the topology
- Dynamics of the routing protocol
6Where Measure?
- Short answer
- Anywhere you can! ?
- End hosts
- Application logs, e.g., Web server logs
- Sending active probes to measure performance
- Individual links/routers
- Load statistics, packet traces, flow traces
- Configuration state
- Routing-protocol messages or table dumps
- Alarms
7Internet Challenges Make Measurement an Art
- Stateless routers
- Routers do not routinely store packet/flow state
- Measurement is an afterthought, adds overhead
- IP narrow waist
- IP measurements cannot see below network layer
- E.g., link-layer retransmission, tunnels, etc.
- Violations of end-to-end argument
- E.g., firewalls, address translators, and proxies
- Not directly visible, and may block measurements
- Decentralized control
- Autonomous Systems may block measurements
- No global notion of time
8Active Measurement Ping
- Adding traffic for purposes of measurement
- Trade-offs between accuracy and overhead
- Need careful methods to avoid introducing bias
- Ping
- Host sends an ICMP ECHO packet to a target
- and captures the ICMP ECHO REPLY
- Useful for checking connectivity, and RTT
- Only requires control of one of the two
end-points - Problems with ping
- Round-trip rather than one-way delays
- Some hosts might not respond
9Active Measurement Traceroute
- Time-To-Live field in IP packet header
- Source sends a packet with a TTL of n
- Each router along the path decrements the TTL
- TTL exceeded sent when TTL reaches 0
- Traceroute tool exploits this TTL behavior
destination
source
Send packets with TTL1, 2, 3, and record
source of time exceeded message
10Active Measurement Challenges of Traceroute
- Measuring multiple paths
- Successive probes may traverse different paths
- Non-participating network elements
- Some routers and firewalls dont reply
- Inaccurate delay information
- Includes processing delays on the router CPU
- Round-trip vs. one-way measurements
- Paths may have asymmetric properties
- Interfaces, not routers
- Returns IP address of interfaces, not routers
11Active Measurement Applications of Traceroute
- Network troubleshooting
- Identify forwarding loops and black holes
- Identify long and convoluted paths
- See how far the probe packets get
- Network topology inference
- Launch traceroute probes from many places
- toward many destinations
- Join together to fill in parts of the topology
- though traceroute undersamples the edges
12Active Measurement Pathchar for Links
rtt(i1) -rtt(i)
Three delay components
?
min. RTT (L)
slope1/c
d
How to infer d,c?
L
13Passive Measurement Logs at Hosts
- Web server logs
- Host, time, URL, response code, content length,
- E.g., 122.345.131.2 - - 15/Oct/1998000025
-0400 "GET /images/wwwtlogo.gif HTTP/1.0" 304 -
"http//www.aflcio.org/home.htm" "Mozilla/2.0
(compatible MSIE 3.02 Update a AK AOL 4.0
Windows 95)" "-" - DNS logs
- Request, response, time
- Useful for workload characterization,
troubleshooting, etc.
14Passive Measurement SNMP
- Simple Network Management Protocol
- Coarse-grained counters on the router
- E.g., byte and packet counts
- Polling
- Management system can poll the counters
- E.g., once every five minutes
- Limitations
- Extremely coarse-grained statistics
- Delivered over UDP!
- Advantages ubiquitous
15Passive Measurement Packet Monitoring
Line card that does packet sampling
Router A
16Packet Monitoring Selecting the Traffic
- Filter to focus on a subset of the packets
- IP addresses/prefixes (e.g., to/from specific Web
sites, client machines, DNS servers, mail
servers) - Protocol (e.g., TCP, UDP, or ICMP)
- Port numbers (e.g., HTTP, DNS, BGP, Napster)
- Collect first n bytes of packet (snap length)
- Medium access control header (if present)
- IP header (typically 20 bytes)
- IPUDP header (typically 28 bytes)
- IPTCP header (typically 40 bytes)
- Application-layer message (entire packet)
17Tcpdump Output(three-way TCP handshake and HTTP
request message)
timestamp
client address and port
Web server (port 80)
234021.008043 eth0 gt 135.207.38.125.1043 gt
lovelace.acm.org.www S 617756405617756405(0)
win 32120 ltmss 1460,sackOK,timestamp 46339
0,nop,wscale 0gt (DF)
SYN flag
TCP options
sequence number
234021.036758 eth0 lt lovelace.acm.org.www gt
135.207.38.125.1043 S 25987946052598794605(0)
ack 617756406 win 16384 ltmss 512gt 234021.036789
eth0 gt 135.207.38.125.1043 gt lovelace.acm.org.www
. 11(0) ack 1 win 32120 (DF) 234021.037372
eth0 gt 135.207.38.125.1043 gt lovelace.acm.org.www
P 1513(512) ack 1 win 32256 (DF) 234021.085106
eth0 lt lovelace.acm.org.www gt 135.207.38.125.1043
. 11(0) ack 513 win 16384 234021.085140 eth0
gt 135.207.38.125.1043 gt lovelace.acm.org.www P
513676(163) ack 1 win 32256 (DF) 234021.124835
eth0 lt lovelace.acm.org.www gt 135.207.38.125.1043
P 1179(178) ack 676 win 16384
18Analysis of Packet Traces
- IP header
- Traffic volume by IP addresses or protocol
- Burstiness of the stream of packets
- Packet properties (e.g., sizes, out-of-order,
etc.) - TCP header
- Traffic breakdown by application (e.g., Web)
- TCP congestion and flow control
- Number of bytes and packets per session
- Application header
- URLs, HTTP headers (e.g., cacheable response?)
- DNS queries and responses, user key strokes,
19Aggregating Packets into IP Flows
flow 4
flow 1
flow 2
flow 3
- Set of packets that belong together
- Source/destination IP addresses and port numbers
- Same protocol, ToS bits,
- Same input/output interfaces at a router (if
known) - Packets that are close together in time
- Maximum spacing between packets (e.g., 15 sec, 30
sec) - Example flows 2 and 4 are different flows due to
time
20Packet vs. Flow Measurement
- Basic statistics (available from both techniques)
- Traffic mix by IP addresses, port numbers, and
protocol - Average packet size
- Traffic over time
- Both traffic volumes on a medium-to-large time
scale - Packet burstiness of the traffic on a small time
scale - Statistics per TCP connection
- Both number of packets bytes transferred over
the link - Packet frequency of lost or out-of-order
packets, and the number of application-level
bytes delivered - Per-packet info (available only from packet
traces) - TCP seq/ack s, receiver window, per-packet
flags, - Probability distribution of packet sizes
- Application-level header and body (full packet
contents)
21Measurement Challenges for Operators
- Network-wide view
- Crucial for evaluating control actions
- Multiple kinds of data from multiple locations
- Large scale
- Large number of high-speed links and routers
- Large volume of measurement data
- Poor state-of-the-art
- Working within existing protocols and products
- Technology not designed with measurement in mind
- The do no harm principle
- Dont degrade router performance
- Dont require disabling key router features
- Dont overload the network with measurement data
22Network Operations Tasks
- Reporting of network-wide statistics
- Generating basic information about usage and
reliability - Performance/reliability troubleshooting
- Detecting and diagnosing anomalous events
- Security
- Detecting, diagnosing, and blocking security
problems - Traffic engineering
- Adjusting network configuration to the prevailing
traffic - Capacity planning
- Deciding where and when to install new equipment
23Basic Reporting
- Producing basic statistics about the network
- For business purposes, network planning, ad hoc
studies - Examples
- Proportion of transit vs. customer-customer
traffic - Total volume of traffic sent to/from each private
peer - Mixture of traffic by application (Web, Napster,
etc.) - Mixture of traffic to/from individual customers
- Usage, loss, and reliability trends for each link
- Requirements
- Network-wide view of basic traffic and
reliability statistics - Ability to slice and dice measurements in
different ways(e.g., by application, by
customer, by peer, by link type)
24Troubleshooting
- Detecting and diagnosing problems
- Recognizing and explaining anomalous events
- Examples
- Why a backbone link is suddenly overloaded
- Why the route to a destination prefix is flapping
- Why DNS queries are failing with high probability
- Why a route processor has high CPU utilization
- Why a customer cannot reach certain Web sites
- Requirements
- Network-wide view of many protocols and systems
- Diverse measurements at different protocol levels
- Thresholds for isolating significant phenomena
25Security
- Detecting and diagnosing problems
- Recognizing suspicious traffic or disruptions
- Examples
- Denial-of-service attack on a customer or service
- Spread of a worm or virus through the network
- Route hijack of an address block by adversary
- Requirements
- Detailed measurements from multiple places
- Including deep-packet inspection, in some cases
- Online analysis of the data
- Installing filters to block the offending traffic
26Traffic Engineering
- Adjusting resource allocation policies
- Path selection, buffer management, and link
scheduling - Examples
- OSPF weights to divert traffic from congested
links - BGP policies to balance load on peering links
- Link-scheduling weights to reduce delay for
gold traffic - Requirements
- Network-wide view of the traffic carried in the
backbone - Timely view of the network topology and
configuration - Accurate models to predict impact of control
operations(e.g., the impact of RED parameters on
TCP throughput)
27Capacity Planning
- Deciding whether to buy/install new equipment
- What? Where? When?
- Examples
- Where to put the next backbone router
- When to upgrade a link to higher capacity
- Whether to add/remove a particular peer
- Whether the network can accommodate a new
customer - Whether to install a caching proxy for cable
modems - Requirements
- Projections of future traffic patterns from
measurements - Cost estimates for buying/deploying the new
equipment - Model of the potential impact of the change
(e.g., latency reduction and bandwidth savings
from a caching proxy)
28Examples of Public Data Sets
- Network-wide data
- Abilene and GEANT backbones
- Netflow, IGP, and BGP traces
- CAIDA DatCat
- Data catalogue maintained by CAIDA
- http//imdc.datcat.org/
- Interdomain routing
- RouteViews and RIPE-NCC
- BGP routing tables and update messages
- Traceroute and looking glass servers
- http//www.traceroute.org/
- http//www.nanog.org/lookingglass.html
29Discussion
- How important is accuracy of the data?
- How can we validate measurement studies? (If we
know the answer already, why are we measuring?) - How to do controlled experiments with measurement
techniques? - Can we move measurement to a science rather than
an art? - Can we identify incentives for making measurement
possible and data available? - Distributed analysis of measurement data?
- An architecture for router or line-card support
for traffic and performance measurement? - Trade-offs between security and privacy?