Effective Diagnosis of Routing Disruptions from End Systems - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Effective Diagnosis of Routing Disruptions from End Systems

Description:

1. Effective Diagnosis of Routing Disruptions from End Systems. Ying Zhang Z. ... More applications today have ... Local Pref : 100- 50. Local Pref : 60- 110 ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 40
Provided by: wing96
Category:

less

Transcript and Presenter's Notes

Title: Effective Diagnosis of Routing Disruptions from End Systems


1
Effective Diagnosis of Routing Disruptions from
End Systems
  • Ying Zhang Z. Morley Mao Ming
    Zhang

2
Routing disruptions impact application performance
  • More applications today have high QoS
    requirements
  • Routing events can cause high loss and long delays

AS B
AS C
AS D
AS E
Internet
Dst
Src
3
Existing approaches to diagnose routing
disruptions are ISP-centric
  • Require routing data from many routers in ISPs
  • Feldmann04, Teixeira04, Wu05
  • Passive and accurate

AS D
AS C
AS B
Internet
4
Limitations of ISP-centric approaches
  • Difficult to gain access to data from many ISPs
  • BGP data reflects expected data-plane paths

ISP
?
?
?
End-systems
AS D
AS C
AS B
?
?
?
?
Internet
5
Can we diagnose entirely from end systems?
  • Goal infer data-plane paths of many routers

Probing host
AS C
ISP A
AS B
AS D
Dst
6
Our approach end systems based monitoring
  • Only require probing from end hosts
  • Cover all the PoPs of a target ISP

Probing host
AS C
Target ISP
AS B
AS D
Dst
7
Our approach end systems based monitoring
  • Cover most of the destinations on the Internet

Probing host
Dst
Dst
AS C
ISP A
AS B
AS D
Dst
Dst
8
Our approach end systems based monitoring
  • Identify routing changes by comparing paths
    measured consecutively

Probing host
AS C
ISP A
AS B
AS D
Dst
9
Advantages and challenges
  • Advantages
  • No need to access to ISP-propriety data
  • Identify actual data-plane paths
  • Monitor data plane performance
  • Challenges
  • Limited resources to probe
  • Coverage of probed paths
  • Timing granularity
  • Measurement noise

10
System architecture
Collaborative probing
Event identification and classification
Event correlation and inference
Event impact analysis
Reports
11
Outline
  • Collaborative probing
  • Event identification and classification
  • Event correlation and inference
  • Result and validation

12
Collaborative probing
  • Using a set of hosts
  • To learn the routing state
  • To improve coverage
  • To reduce overhead

Probing host
AS C
ISP A
AS B
AS D
13
Outline
  • Collaborative probing
  • Event identification and classification
  • Event correlation and inference
  • Result and validation

14
Event classification
  • Classify events according to ingress/egress
    changes

Type2 Ingress PoP same, egress PoP different
Type1 Ingress PoP changes
Type3 Ingress PoP same, egress PoP same
Destination Prefix P
Target ISP
Probing host
15
Outline
  • Collaborative probing
  • Event identification and classification
  • Event correlation and inference
  • Result and validation

16
Likely causes link failures
Neighbor AS
Destination Prefix P
Old egress PoP
New egress PoP
Old path
New path
Target ISP
Probing host
16
17
Likely causes internal distance changes
  • Hot potato changes
  • Cost of old internal path increases
  • Cost of new internal path decreases

Neighbor AS
Old egress PoP
New egress PoP
distance 120
distance 80
distance 100
distance 120
17
Probing host
18
Event correlation
  • Spatial correlation a single network failure
    often affects multiple routers
  • Temporal correlation routing events occurring
    close together are likely due to only a few causes

19
Inference methodology
  • An evidence an event that supports the cause

Destination prefix P
Link L
New egress
Cause Link L is down
New path
Probing host
Target ISP
Probing host
20
Inference methodology
  • A conflict a measurement trace that conflicts
    with the cause

Destination prefix P
Link L
New egress
Cause Link L is down
New path
Probing host
Target ISP
Probing host
21
Inference methodology
Evidence node 1,2,3-gt1,2,4
AS 3
AS 4
Withdrawal
AS 2
Cause node 3 withdraws the route
AS 1
Cause link 2-3 down
22
Inference methodology
Evidence Graph
Evidence node 1,2,3-gt1,2,4
Evidence node 0,2,3-gt0,2,4
AS 3
AS 4
Withdrawal
AS 2
Cause node 3 withdraws the route
AS 1
AS 0
Cause link 2-3 down
23
Inference methodology
Conflict Graph
AS 6
Conflict node 1,2,3,6
Conflict node 0,2,3,6
Conflict node 0,2,3
AS 3
AS 2
Cause link 2-3 down
Cause node 3 withdraws the route
AS 1
AS 0
24
Inference methodology
Evidence Graph
Conflict Graph
Conflict node 1,2,3,6
Conflict node 0,2,3,6
Conflict node 0,2,3
Evidence node 1,2,3-gt1,2,4
Evidence node 0,2,3-gt0,2,4
Evidence 2 Conflicts 3
Evidence 2 Conflicts 0
  • Greedy algorithm minimum set of causes that can
    explain all the evidence while minimizing
    conflicts

25
Outline
  • Collaborative probing
  • Event identification and classification
  • Event correlation and inference
  • Result and validation

26
ISPs studied
27
Results of event classification
  • Many events are internal changes
  • Abilene has many ingress changes

28
Validation with BGP based approach Wu05
  • Hot potato changes egress point changes due to
    internal distance changes

Number of incidences identified by both
Number of incidences identified by our method
Number of incidences identified by BGP method
False negative, false positives
29
Validation with BGP based approach
  • Session resets peering link up/down
  • Inaccuracy reasons
  • Limited coverage
  • Coarse-grained probing
  • Measurement noise

30
System performance
  • Can keep up with generated routing state
  • Applicable for real-time diagnosis and mitigation
  • Reactive construct alternate paths to bypass the
    problem
  • Proactive avoid paths with many historical
    routing disruptions

31
Conclusion
  • Developed the first system to diagnose routing
    disruptions purely from end systems
  • Used a simple greedy algorithm on two bipartite
    graphs to infer causes
  • Comprehensively validated the accuracy

32
Thank you!
  • Questions?

33
Performance impact analysis
  • End-to-end latency changes caused by different
    types of routing events

34
Validation with BGP data
  • BGP feeds from RouteView, RIPE, Abilene, and 29
    BGP feeds from a Tier-1 ISP
  • The destination prefix coverage and the routing
    event detection rate

35
Event classification same ingress PoP,
different egress PoP
  • Policy changes
  • Local preference in the old route decreases
  • Local preference in the new route increases

Neighbor AS
Local Pref 60-gt110
Local Pref 100-gt50
Old egress PoP
New egress PoP
Old path
New path
Target ISP
35
Probing host
36
Event classification same ingress PoP,
different egress PoP
  • External routing changes
  • Old route worsens due to external factors
    (withdrawal, longer AS path)
  • New route improves due to external factors

AS A
AS B
ABCD-gtABEFD
BCEFD-gtBEFD
Old egress PoP
New egress PoP
Old path
New path
Target ISP
36
Probing host
37
Event classification same ingress PoP, same
egress PoP
  • Internal PoP path changes
  • Cost of old internal path increases
  • Cost of new internal path decreases
  • External AS path changes

Destination Prefix P
New path
Old path
Target ISP
37
Probing host
38
Results of cause inference
  • Effectiveness of inference algorithm
  • Clusters a group of events with the same root
    cause

39
Event identification
  • A routing event path changes
  • Event identificationomparing continuous routing
    snapshots

Probing host
AS C
ISP A
AS B
AS D
Dst
Write a Comment
User Comments (0)
About PowerShow.com