Title: Modeling Time Correlation in Passive Network Loss Tomography
1Modeling Time Correlation in Passive Network Loss
Tomography
- Jin Cao (Alcatel-Lucent, Bell Labs), Aiyou Chen
(Google Inc), Patrick P. C. Lee (CUHK) - June 2011
2Outline
- Motivation
- Loss model
- Include correlation
- Profile likelihood inference
- Basic approach
- Extensions
- Simulation results
3Motivation
- Monitoring a networks health is critical for
reliability guarantees - to identify bottlenecks/failures of network
elements - to plan resource provisioning
- Its challenging to monitor a large-scale network
- Collection of statistics can bring huge overhead
- Network loss tomography
- compute statistical estimates of internal losses
through end-to-end external measurements
4Loss Tomography Overview
- Active probing
- Consider a tree setting.
- Send unicast probes to different receivers
(leaves) - Collect statistics at receivers
- Assume probes may be lost at links
- Our goal infer loss rate of common link
(root-to-middle-node link)
4
probes
3
2
1
- Key idea time correlation of packet losses
- neighboring packets likely experience similar
loss behavior on the common link
5Passive Loss Tomography
- Drawback of active probing
- introduce probing overhead
- require collaboration of both senders and
receivers - Passive loss tomography
- Monitor underlying traffic
- E.g., use TCP data and ACKs to infer losses
- Challenges
- Limited control. Time correlation highly varies.
- Can we model time correlation?
6Prior Work on Loss Tomography
- Multicast loss inference Cáceres et al. 99,
Ziotopolous et al. 01, Arya et al. 03 - Send multicast probes
- Drawback require multicast be enabled
- Unicast loss inference Coates Novak 00,
Harfoush et al. 00, Duffield et al. 06 - Send unicast probes to different receivers
- Drawback introduce probing overhead
- Passive loss tomography Tsang et al. 01, Brosh
et al. 05, Padmanabhan et al. 03 - Use existing traffic for inference
- Drawback no explicit model of time correlation
7Our Objective
8Our Contributions
- Formulate a loss model as a function of time
correlation - Show our loss model is identifiable
- Develop a profile-likelihood method for simple
and accurate inference - Extend our method for complex topologies
- Model and network simulations with R and ns2
9Where to Apply Our Work?
- An extension for TCP loss inference platform
- use packet retransmissions to infer losses
- Identify packet pairs neighboring packets to
different leaf branches
TCP packets/ACKs
Determine information of loss samples
TCP packets
common link
loss samples packet pairs
TCP ACKs
Our inference approach
infer loss rate of common link
1
2
K
- Note our work is not on how to sample, but uses
existing samples to accurately compute loss rates
10Loss Modeling
- Main idea use packet pairs to capture loss
correlation - Issues to address
- How to integrate correlation into loss model?
- Is the model identifiable?
- What is the inference error if we wrongly assume
perfect correlation?
11Loss Model
- Define
- A packet pair (U, V) to diff. leaves
- p, p1, p2 link success rates
- Zu, Zv success events on common link
- ?(?) correlation(Zu, Zv) with time difference
? - 0 ?(?) 1 (by definition)
- ?(0) 1
- ?(?) is monotonically decreasing w.r.t. ?
- Probability that both U, V are successfully
delivered from root to respective leaf nodes - r11 p p1 p2 (p (1 p) ?(?))
- if ?(?) 1, r11 p p1 p2
- if ?(?) 0, r11 p2 p1 p2
12Modeling Time Correlation
- Perfect correlation ?(?) 1
- In practice, ?(?) lt 1 for ? gt 0 (i.e.,
decaying) - r11 p p1 p2 (p (1 p) ?(?)) is
over-estimated in perfect correlation - Consider two specific approximations
- Linear form ?(?) exp(-a ?) (a is decaying
constant) - Quadratic form ?(?) exp(-a ?2)
- If ? is small, good enough approximations to
capture time-decaying of correlation - Claim better than simply assuming perfect
correlation
13Theorems
- Theorem 1 Under the loss correlation model, the
link success rates p, p1, p2 and constant a are
identifiable, given that ?(0) 1 - Theorem 2 If perfect correlation is wrongly
assumed in a setting with imperfect correlation,
then there is an absolute asymptotic bias. - See proofs in paper.
14Profile Likelihood Inference
- Given the loss model, how to estimate loss rate?
- Inputs
- single packet end-to-end measurements
- packet pair end-to-end measurements
- Topology
- Two-level, K-leaf tree
- Profile likelihood (PL) inference
- Focus on parameters of interest (i.e., link loss
rates to be inferred) - Replace nuisance unknowns with appropriate
estimates
15Profile Likelihood Inference
- Step 1 apply end-to-end success rates
- Let Pi end-to-end success rate to leaf link I
-
- Re-parameterize r11 (for every pair of leaves) as
a function of p and Pis - Solve for p, P1, P2, , PK, a
- But this is challenging with many variables to
solve
Pi p pi
r11 PU PV p-1(p (1 p) ?(?))
16Profile Likelihood Inference
- Step 2 remove nuisance parameters
- Based on profile likelihood Murphy 00, replace
nuisance unknowns with appropriate estimates - Replace Pi with maximum likelihood estimate
- Ni number of packets going to leaf i
- Mi number of total successes to leaf I
- Only two variables to solve p and a
17Profile Likelihood Inference
- Step 3 estimate p when ?(.) is unknown
- Approximate ?(.) with either linear or quadratic
form - To solve for p and a, we optimize log-likelihood
function using BFGS quasi-Newton method - See paper for details
18Extension Remove Skewness
- If some leaf has only a few packets (i.e., Mi, Ni
are small), the approximation of Pi will be
inaccurate. - Especially when there are many leaf branches
- Heuristic let Pi be the same for all i
- Intuition remove skewness of traffic loads among
leaves by taking aggregate average - Let
- N total number of packets to all leaves
- M total number of successes to all leaves
- Take the approximation
19Extension Large-Scale Topology
- If there are many levels in a tree, we decompose
into many two-level problems
- Estimate loss rates f0 and f1
- f max(0, (f1 f0) / (1 f0))
20Network Simulations
- We use model simulations to verify the
correctness of our models under ideal settings - See details in paper
- Network simulations with ns2
- Traffic models
- Short-lived TCP sessions
- Background UDP on-off flows
- Loss models
- Links follow exponential ON-OFF loss model
- Queue overflow due to UDP bursts
- Both loss models are justified in practice and
show loss correlation
TCP/UDP flows
21Network Simulations
- Three estimation methods
- est.equal take aggregate average in end-to-end
success rates - est.self take individual end-to-end success
rates - est.perfect use est.self but assuming perfect
correlation
22Experiment 1 ON-OFF Loss
- Consider two-level tree, with exponential on-off
loss
p 2, pi 0
p 2, pi 2
- est.perfect is worst among all
23Experiment 2 Skewed Traffic
- Uneven traffic (let K 10)
- ß of traffic going to leaves 1 5
- 1 ß of traffic going to leaves 6 - 10
p 2, pi 0
p 2, pi 2
- est.equal is robust to skewed traffic
24Experiment 3 Large Topology
- Goal verify if two-level inference can be
extended for multi-level topology
25Experiment 3 Large Topology
Level 1
Level 2
Level 3
Losses occur only in links of interest
26Experiment 3 Large Topology
Level 1
Level 2
Level 3
Losses occur only in links of interest
- est.equal is best among all
- around 5, 10, 20 errors in levels 1, 2, 3 resp.
27Conclusions
- Provide first attempt to explicitly model time
correlation in loss tomography - Propose profile likelihood inference
- Remove nuisance parameters
- Simplify loss inference without compromising
accuracy - Conduct extensive model/network simulations
- Assuming perfect correlation is not a good idea
- est.equal is robust in general, even for skewed
traffic loads and large topology