Title: Internet Measurement and some inference
1Internet Measurement(and some inference
modeling)
- Shivkumar (Shiv) Kalyanaraman
- Rensselaer Polytechnic Institute
- shivkuma_at_ecse.rpi.edu
- http//www.ecse.rpi.edu/Homepages/shivkuma/
- GOOGLE Shiv RPI
2Topics
- Measurement philosophy why, what, when, where,
how? - Some measurement projects results
- Techniques passive active
- Packet tracing
- SNMP
- Probing
- Inference and Modeling
- Tomography Traffic Matrix Estimation for
network engineering - Traffic modeling
- Rocketfuel inferring topologies from outside ISP
networks
3Why Measurement?
- We built it, we depend on it, so we must try to
understand it as it works in reality... - Measurement gives us the data and basis for this
understanding. - Modeling, Inference etc to get new understanding
learning from data - Complex interactions between protocols not well
modeled during their design. - Need support for troubleshooting and network
management - Wide area behavior unpredictable
- Change is normal
4Characteristics of the Internet
- The Internet is
- Decentralized (loose confederation of peers)
- Self-configuring (no global registry of topology)
- Stateless (limited information in the routers)
- Connectionless (no fixed connection between
hosts) - These attributes contribute
- To the success of Internet
- To the rapid growth of the Internet
- and the difficulty of controlling the Internet!
ISP
sender
receiver
5Internet Measurement Challenges
- Size of the Internet
- O(100M) hosts, O(1M) routers, O(10K) networks
- Complexity of the Internet
- Components, protocols, applications, users
- Constant change is the norm
- Web, e-commerce, peer-to-peer, wireless, next?
- The Internet was not developed with measurement
as a fundamental feature - Nearly every network operator would like to keep
most data on their network private - Floyd and Paxson, Difficulties in Simulating the
Internet, IEEE/ACM Transactions on Networking,
2000.
6Themes
- Measurement has been the basis for critical
improvements - Without measurement, what do you know?
- Measurement capability in the Internet is limited
- The systems not designed to support measurement
- Measurement tools and infrastructures are few and
limited - Size, diversity, complexity and change
- Measurement data presents many challenges
- Networking researchers need better connections
with experts in other domains
7Operator Philosophy Tension With IP
- Accountability of network resources
- But, routers dont maintain state about transfers
- But, measurement isnt part of the infrastructure
- Reliability/predictability of services
- But, IP doesnt provide performance guarantees
- But, equipment is not especially reliable (no
five-9s) - Fine-grain control over the network
- But, routers dont do fine-grain resource
allocation - But, network automatically re-routes after
failures - End-to-end control over communication
- But, end hosts and applications adapt to
congestion - But, traffic may traverse multiple domains of
control
8Network Operations Measure, Model, and Control
Network-wide what-if model
Offered traffic
Topology/ Configuration
Changes to the network
measure
control
Operational network
9Operations Research Detect, Diagnose, and Fix
- Detect note the symptoms of a problem
- Periodic polling of link load statistics
- Active probes measuring performance
- Customer complaining (via the phone network?)
- Diagnose identify the illness
- Change in user behavior?
- Router/link failure or policy change?
- Denial of service attack?
- Fix select and dispense the medicine
- Routing protocol reconfiguration
- Installation of packet filters
- Network measurement plays a key role in each step!
10(No Transcript)
11Traffic Measurement Control vs. Discovery
- Discovery characterizing the network
- End-to-end characteristics of delay, throughput,
and loss - Verification of models of TCP congestion control
- Workload models capturing the behavior of Web
users - Understanding self-similarity/multi-fractal
traffic - Control managing the network
- Generating reports for customers and internal
groups - Diagnosing performance and reliability problems
- Tuning the configuration of the network to the
traffic - Planning outlay of equipment (routers, proxies,
links)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Measurement Techniques
44Time Scales for Network Operations
- Minutes to hours
- Denial-of-service attacks
- Router and link failures
- Serious congestion
- Hours to weeks
- Time-of-day or day-of-week engineering
- Outlay of new routers and links
- Addition/deletion of customers or peers
- Weeks to years
- Planning of new capacity and topology changes
- Evaluation of network designs and routing
protocols
45Traffic Measurement SNMP Data
- Simple Network Management Protocol (SNMP)
- Router CPU utilization, link utilization, link
loss, - Collected from every router/link every few
minutes - Applications
- Detecting overloaded links and sudden traffic
shifts - Inferring the domain-wide traffic matrix
- Advantage
- Open standard, available for every router and
link - Disadvantage
- Coarse granularity, both spatially and temporally
46(No Transcript)
47Traffic Measurement Packet-Level Traces
- Packet monitoring
- IP, TCP/UDP, and application-level headers
- Collected by tapping individual links in the
network - Applications
- Fine-grain timing of the packets on the link
- Fine-grain view of packet header fields
- Advantages
- Most detailed view possible at the IP level
- Disadvantages
- Expensive to have in more than a few locations
- Challenging to collect on very high-speed links
- Extremely high volume of measurement data
48(No Transcript)
49Extracting Data from IP Packets
IP
IP
IP
TCP
TCP
TCP
Application message (e.g., HTTP response)
- Many layers of information
- IP source/dest IP addresses, protocol (TCP/UDP),
- TCP/UDP src/dest port numbers, seq/ack, flags,
- Application URL, user keystrokes, BGP updates,
50Aggregating Packets into Flows
flow 4
flow 1
flow 2
flow 3
- Set of packets that belong together
- Source/destination IP addresses and port numbers
- Same protocol, ToS bits,
- Same input/output interfaces at a router (if
known) - Packets that are close together in time
- Maximum inter-packet spacing (e.g., 15 sec, 30
sec) - Example flows 2 and 4 are different flows due to
time
51(No Transcript)
52(No Transcript)
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57(No Transcript)
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71Summary Traffic Measurement Flow-Level Traces
- Flow monitoring (e.g., Cisco Netflow)
- Measurements at the level of sets of related
packets - Single list of shared attributes (addresses, port
s, ) - Number of bytes and packets, start and finish
times - Applications
- Computing application mix and detecting DoS
attacks - Measuring the traffic matrix for the network
- Advantages
- Medium-grain traffic view, supported on some
routers - Disadvantages
- Not uniformly supported across router products
- Large data volume, and may slow down some routers
- Memory overhead (size of flow cache) grows with
link speed
72Summary Reducing Packet/Flow Measurement Overhead
- Filtering select a subset of the traffic
- E.g., destination prefix for a customer
- E.g., port number for an application (e.g., 80
for Web) - Aggregation grouping related traffic
- E.g., packets/flows with same next-hop AS
- E.g., packets/flows destined to a particular
service - Sampling subselecting the traffic
- Random, deterministic, or hash-based sampling
- 1-out-of-n or stratified based on packet/flow
size - Combining filtering, aggregation, and sampling
73Summary Comparison of Techniques
Sampling
Filtering
Aggregation
Precision
exact
exact
approximate
constrained a-priori
constrained a-priori
Generality
general
Local Processing
filter criterion for every object
table update for every object
only sampling decision
Local memory
one bin per value of interest
none
none
depends on data
depends on data
Compression
controlled
74(No Transcript)
75(No Transcript)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79Inference and Modeling
80DATA-DRIVEN
81(No Transcript)
82Eg The Network Design Problem
83(No Transcript)
84(No Transcript)
85Traffic Modeling
86(No Transcript)
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)
94(No Transcript)
95(No Transcript)
96 Mandelbrots Construction
- Renewal reward processes and their aggregates
- Aggregate is made up of many constituents
- Each constituent is of the on/off type
- On/off periods have a duration
- Constituents make contributions (rewards) when
on - Constituents make no contributions when off
- What can be said about the aggregate?
- In terms of assumed type of randomness for
durations and rewards - In terms of implied type of burstiness
97Mandelbrots Types of Randomness
- Distribution functions/random variables
- Mild ? finite variance (Gaussian)
- Wild ? infinite variance
- Correlation function of stochastic process
- None gt IID (independent, identically
distributed) - Mild ? short-range dependence (SRD, Markovian)
- Wild ? long-range dependence (LRD)
-
-
98Mandelbrots Types of Burstiness
Distribution function Mild
Wild
Mild
Wild
Correlation structure
- Tail-driven burstiness (Noah effect)
- Dependence-driven burstiness (Joseph effect)
99Type of Burstiness Smooth
CCDF Function 1-F(x)
1-F(x) on log scale
x on linear scale
Correlation Function r(n)
r(n) on log scale
lag n on linear scale
100Type of Burstiness bursty
CCDF Function 1-F(x)
1-F(x) on log scale
x on linear scale
Correlation Function r(n)
r(n) on log scale
lag n on log scale
101Type of Burstiness Bursty
CCDF Function 1-F(x)
1-F(x) on log scale
x on log scale
Correlation Function r(n)
?
r(n) on log scale
lag n on linear scale
102Type of Burstiness BURSTY
CCDF Function 1-F(x)
?
1-F(x) on log scale
x on log scale
Correlation Function r(n)
?
r(n) on log scale
lag n on log scale
103Mandelbrots Types of Burstiness
Distribution function Mild
Wild
Mild
Wild
Correlation structure
- Tail-driven burstiness (Noah effect)
- Dependence-driven burstiness (Joseph effect)
104Inference For Network Engineering Traffic Matrix
Estimation
105Network Engineering Inference
- Reliability analysis
- Predicting traffic under planned or unexpected
router/link failures - Traffic engineering
- Optimizing OSPF weights to minimize congestion
- Capacity planning
- Forecasting future capacity requirements
106Traffic Matrix Problem
107(No Transcript)
108(No Transcript)
109(No Transcript)
110(No Transcript)
111(No Transcript)
112(No Transcript)
113(No Transcript)
114(No Transcript)
115(No Transcript)
116(No Transcript)
117(No Transcript)
118i.e. Unknowns gt Equations
119Naïve Approach
In real networks the problem is highly
under-constrained
120Simple Gravity Model
- Motivated by Newtons Law of Gravitation
- Assume traffic between sites is proportional to
traffic at each site - y1 ? x1 x2
- y2 ? x2 x3
- y3 ? x1 x3
- Assume there is no systematic difference between
traffic in different locations - Only the total volume matters
- Could include a distance term, but locality of
information is not so important in the Internet
as in other networks
121Simple Gravity Model
Better than naïve, but still not very accurate
122Generalized Gravity Model
- Internet routing is asymmetric
- Hot potato routing use the closest exit point
- Generalized gravity model
- For outbound traffic, assumes proportionality on
per-peer basis (as opposed to per-router)
123Generalized Gravity Model
Fairly accurate given that no link constraint is
used
124Tomographic Approach
- Apply the link constraints
1
route 1
2
router
route 3
route 2
3
x AT y
125Tomographic Approach
- Under-constrained linear inverse problem
- Find additional constraints based on models
- Typical approach use higher order statistics
- Disadvantages
- Complex algorithm doesnt scale
- Large networks have 1000 nodes, 10000 routes
- Reliance on higher order statistics is not robust
given the problems in SNMP data - Artifacts, Missing data
- Violations of model assumptions (e.g.
non-stationarity) - Relatively low sampling frequency 1 sample every
5 min - Unevenly spaced sample points
- Not very accurate at least on simulated TM
126Inference Network Tomography
From link counts to the traffic matrix
Sources
3Mbps
5Mbps
4Mbps
4Mbps
Destinations
127Tomography Formalizing the Problem
- Source-destination pairs
- p is a source-destination pair of nodes
- xp is the (unknown) traffic volume for this pair
- Routing
- Rlp 1 if link l is on the path for src-dest
pair p - Or, Rlp is the proportion of ps traffic that
traverses l - Links in the network
- l is a unidirectional edge
- yl is the observed traffic volume on this link
- Relationship y Rx (now work back to get x)
128Tomography Single Observation is Insufficient
- Linear system is underdetermined
- Number of nodes n
- Number of links e is around O(n)
- Number of src-dest pairs c is O(n2)
- Dimension of solution sub-space at least c - e
- Multiple observations are needed
- k independent observations (over time)
- Stochastic model with src-dest counts Poisson
i.i.d - Maximum likelihood estimation to infer traffic
matrix - Vardi, Network Tomography, JASA, March 1996
129(No Transcript)
130Tomography Challenges
- Limitations
- Cannot handle packet loss or multicast traffic
- Statistical assumptions dont match IP traffic
- Significant error even with large of samples
- High computation overhead for large networks
- Directions for future work
- More realistic assumptions about the IP traffic
- Partial queries over subgraphs in the network
- Incorporating additional measurement data
131Tomo-gravity
- Tomo-gravity tomography gravity modeling
- Exploit topological equivalence to reduce problem
size - Use least-squares method to get the solution,
which - Satisfies the constraints
- Is closest to the gravity model solution
- Can use weighted least-squares to make more robust
least square solution
gravity model solution
constraint subspace
132Tomo-gravity Accuracy
Accurate within 10-20 (esp. for large elements)
133Tomo-gravity Solution
- Tomo-gravity infers traffic matrices from widely
available measurements of link loads - Accurate especially accurate for large elements
- Robust copes easily with data glitches, loss
- Flexible extends easily to incorporate more
detailed measurements, where available - Fast for example, solves ATTs IP backbone
network in a few seconds - In daily use for ATT IP network engineering
- Reliability analysis, capacity planning, and
traffic engineering
134Summary Tomo-gravity
- Tomo-gravity takes the best of both tomography
and gravity modeling - Simple, and quick
- A few seconds for whole ATT backbone
- Satisfies link constraints
- Gravity model solutions dont
- Uses widely available SNMP data
- Can work within the limitations of SNMP data
- Only uses first order statistics ? interpolation
very effective - Limited scope for improvement
- Incorporate additional constraints from other
data sources e.g., Netflow where available - Operational experience very positive
- In daily use for ATT IP network engineering
- Successfully prevented service disruption during
simultaneous link failures