Title: Internet Iso-bar: A Scalable Overlay Distance Monitoring System
1Internet Iso-bar A Scalable Overlay Distance
Monitoring System
- Yan Chen, Lili Qiu, Chris Overton and Randy H.
Katz
2Motivations
- Applications of end-to-end distance
monitoring/estimation - Overlay Routing/Location
- Peer-to-peer Systems
- VPN Management/Provisioning
- Service Redirection/Placement
- Cache-infrastructure Configuration
- Requirements for E2E distance monitoring system
- Scalable a small amount of probing traffic and
system load - Accurate capture congestion/failures latency
estimation - Fast small computation for real-time estimation
- Incrementally deployable
- Easy to use
- Benefit applications
- Application-driven measurement
- Inference techniques for trouble shooting, root
cause analysis - Improve application performance and reliability
3E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring
Scalability (N hosts, AP address prefixes, K landmarks, C clusters) N gt AP C C K
Estimation accuracy
Monitors deployment
4E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring Static estimation
Scalability (N hosts, AP address prefixes, K landmarks, C clusters) N gt AP C C K O(N K) probes, each landmark takes O(N)
Estimation accuracy Accurate, but only symmetric distance
Monitors deployment End hosts
5E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring Static estimation Yes Yes Yes
Scalability (N hosts, AP address prefixes, K landmarks, C clusters) N gt AP C C K O(N K) probes, each landmark takes O(N) O(FAP) probes, F number of CDN edge server farms Clustering need pair-wise distance b/t all pairs of APs, O(C2 AP) probes O(N2) probes
Estimation accuracy Accurate, but only symmetric distance No existing comparison. Inaccurate Triangulation inequality proximity-based clustering Exact measurements ?most accurate
Monitors deployment End hosts CDN edge servers Transit ASs (hard to deploy) End hosts
6E2E Estimation/Monitoring Systems Comparison
Properties GNP Akamai IDMaps RON Internet Isobar
Dynamic monitoring Static estimation Yes Yes Yes Yes
Scalability (N hosts, AP address prefixes, K landmarks, C clusters) N gt AP C C K O(N K) probes, each landmark takes O(N) O(FAP) probes, F number of CDN edge server farms Clustering need pair-wise distance b/t all pairs of APs, O(C2 AP) probes O(N2) probes O(C2 N) probes
Estimation accuracy Accurate, but only symmetric distance No existing comparison. Inaccurate Triangulation inequality proximity-based clustering Exact measurements ?most accurate Similar accuracy to GNP
Monitors deployment End hosts CDN edge servers Transit ASs (hard to deploy) End hosts End hosts
7Problem Formulation
-
- Given N end hosts, how to select a subset of them
as monitors and build a scalable overlay distance
monitoring service without knowing the underlying
topology? - Distance info desired report congestion/failure
if occurs, otherwise latency -
8E2E Congestion/Failures Analysis
- Based on National Lab of Applied Network Research
(NLANR) AMP data set - 104 sites in US (including Alaska, Hawaii)
Australia, every host ping all other hosts every
minute - Sliding window of 10 samples, use minimum RTT as
latency sample - 105M measurements, 6/25/01 7/1/01
- Congestion/failures (uniformly denoted as
congestion) defined as measurement loss or
(latency gt geo mean geo stdev) - Congestions not common, only 0.96 samples
- A few congestion links dominate the E2E
congestion - Besides those happened at the last mile, E2E
congestion exhibit strong spatial correlation
9NLANR AMP Sites
10Internet Iso-bar
- Procedures
- Cluster hosts that perceive similar performance
to a small set of sites (landmarks) - For each cluster, select a monitor for active and
continuous probing - Estimate distance between any pair of hosts using
inter- and intra-cluster distance
11Internet Iso-bar (I) Host Clustering
- Define correlation distance between each pair of
hosts - Existing work use network proximity
cor_dist(i,j) net_dist(i,j) (denoted pij) - Iso-bar uses network distance vector (k landmarks
for clustering only) netVi pi1, pi2, ,
pikT - Euclidean distance based
- Cosine vector similarity based
- Apply generic clustering methods
- Optimize the worst case minimize the maximum
radius of all clusters (limit_num_minRmax) - Optimize the average case minimize the sum of
total host-monitor distance (limit_num_minDistSum)
12Diagram of Internet Iso-bar
Landmark
End Host
13Diagram of Internet Iso-bar
Cluster C
Cluster B
Cluster A
Landmark
Monitor
End Host
14Internet Iso-bar (II) Distance Estimation
- Intra-cluster estimation
- If path(m, i) or path(m, j) is congested, report
path(i, j) as congestion - O/w pDist(i,j) (mDist(m, i) mDist(m, j))/ 2
- Inter-cluster estimation
- If path(mi, i), path(mi, mj) or path(mj, j) is
congested, report path(i, j) as congestion - O/w pDist(i,j) mDist(mi, mj)
15Evaluation Methodology
- Internet measurement data
- NLANR AMP data set
- Clustering with geometric mean of training date
- Estimation dates 6/25/01 7/24/01, 12/06/01
- Keynote CDN measurement data
- 63 agents covering all major ISPs in US, Europe,
Asia Australia - 2 targets (CDN re-directors) in Boston and Texas
- Measure TCP connection time (2/3 of handshake)
from each agent to target every minute - Training date 10/21/2002
- Estimation dates 10/21/2002 11/25/2002
- Similar latency estimation results for both
datasets, present NLANR
16Evaluation Methodology (II)
- Estimation metric
- Relative accuracy error for un-congested latency
- Stability
- For dynamic monitoring systems, amount of
congestion captured and false positive ratio - Internet distance estimation techniques evaluated
- Omniscent use g-mean data of (source, dest) on
training date - Global Network Positioning (GNP)
- Clustering with network distance vector (Iso-bar)
- Clustering with network proximity
- 15 clusters vs. 15 landmarks of GNP
17Latency Prediction Accuracy Stability
- Training date 06/25/01
- Estimation dates 06/25/01 - 12/06/01
- Summary of the 90th percentile relative error for
various distance estimation methods
18Distance Estimation Results
- Latency estimation when un-congested
- Omniscient is the most accurate, but unscalable
- GNP and Iso-bar are the second
- Both have good accuracy and stability for
distance estimation - GNP unscalable for online monitoring, static
approach - Iso-bar outperforms proximity-based clustering by
50 - 90th percentile lt 0.5, if 60ms latency, 45ms lt
prediction lt 90ms - Congestion/failures estimation
- 6/25/01 7/01/01, averagely 148K congested
measurements per day - Iso-bar captures 78 of them, 32 false positive
ratio - Only 3 of monitoring overhead compared with RON
19Conclusions
- Propose Internet Iso-bar
- Cluster hosts based on the network similarity
- Inter- and Intra-cluster latency estimation w/
first-step heuristic for congestion/failure
detection - Preliminary results promising
- High accuracy stability for normal latency
estimation - Simple heuristics of congestion estimation
captures 78 of congestions, with 32 false
positive, and only 3 of monitoring overhead of
RON
20Ongoing Work
- Current focus switch from latency estimation to
congestion/failures estimation - Apply topology information, e.g. lossy link
detection with network tomography - Cluster and choose monitors based on the lossy
links - Benefit applications
- Dynamic node join/leave for P2P systems
- Joining client pings landmark sites to get
distance vector, compare with those of monitors,
and choose closest one to join - Split/merge clusters
- Multi-path selection
- More comprehensive evaluation
- Simulate with large network
- Deploy on PlanetLab, and operate at finer level
21Internet Iso-bar
- Problem formulation
- Given N end hosts, how to select a subset of them
as monitors and build a scalable overlay distance
monitoring service without knowing the underlying
topology? - Distance info desired report congestion/failure
if occurs, o/w latency - Our approach
- Cluster hosts that perceive similar performance
to a small set of sites (landmarks) - For each cluster, select a monitor for active and
continuous probing - Estimate distance between any pair of hosts using
inter- and intra-cluster distance - Performance evaluation
- Using real Internet measurement data
- Compared with other distance estimation services
GNP, RON - Performance metrics accuracy and stability
22Internet Iso-bar (II) Distance Estimation
- Congestion/failures analysis
- Congestion/failures (uniformly denoted as
congestion) not common - Defined as measurement loss or (latency gt geo
mean geo stdev) - Only 0.96 out of 105M NLANR ping measurements
over a week - Suggest a few congestion links dominate the E2E
congestion - Besides those happened at the last mile, E2E
congestion exhibit strong spatial correlation - Estimation algorithms
- Intra-cluster estimation (i and j use the same
monitor m) - If path(m, i) or path(m, j) is congested, report
path(i, j) as congestion - O/w predictedDist(i,j) (measuredDist(m, i)
measuredDist(m, j))/ 2 - Inter-cluster distance estimation
- If path(monitori, i), path(monitori, monitorj) or
path(monitorj, j) is congested, report path(i, j)
as congestion - Otherwise predictedDist(i,j) measuredDist(monito
ri, monitorj) - Self-diagnostics of monitors, check for last-mile
congestion