Title: Interprtation de la dynamique de BGP
1Interprétation de la dynamique de BGP
Olaf MaennelTechnische Universität München
2BGP
- Defacto standard inter-domain routing protocol
- Path vector protocol
- Policy routing protocol
- Topology changes, etc... ? BGP updates
prefix p
BGP observation point
AS 4
AS 5
AS 1
path 5 3 1
path 1
path 3 1
AS 3
3Locating routing instabilities
- LOCATION AS edgeeither internal to an
AS (e.g., inside AS1AS1) or link between
two ASes (e.g., external AS1AS2) - INSTABILITY any change of BGP advertisment over
a BGP session
4Why identify locations of instabilities?
- Instabilities can lead to
- Unreachablility / poor performance
- Route oscillation
- BGP churn
- Black holes
-
- Identifying the location enables corrective action
5Causes of instabilities
- Possible causes for BGP instabilities
- BGP session availability
- Session establishment/teardown/reset
- BGP session filters
- BGP attribute or filter manipulation
- misconfiguration
- IGP cost change
- IGP metric change, link or node failures
6Outline
- High-level approach
- Limitations
- Evaluation
7Outline
- High-level approach
- Limitations
- Evaluation
8High-level approach (I) Dimensions
- Locate BGP instability by analyzing BGP updates
along three dimensions - 1. Time
- 2. Views
- 3. Prefixes
- Locate BGP instability by analyzing BGP updates
along three dimensions - 1. Time
- 2. Views
- 3. Prefixes
- Locate BGP instability by analyzing BGP updates
along three dimensions - 1. Time
- 2. Views
- 3. Prefixes
- Locate BGP instability by analyzing BGP updates
along three dimensions - 1. Time
- 2. Views
- 3. Prefixes
9One trigger multipe updates?!!
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
ISP C
Customer E
10One trigger multipe updates?!!
11One trigger multipe updates?!!
12BGP doing its job
13BGP doing its job
14BGP doing its job
15BGP doing its job
16Definition of terms
echoes multiple BGP updates for - same
triggering event - on one peering session
- for one prefix
Group updates into update bursts - same
prefix / peer - short time window
updates for prefix A seen on peer P1
BGP updates (echoes)
time
17Digression update burst duration
convergence can take rather long
18Time
BGP observation point A
prefix p
same prefix same observation point
19Views
observation point B
BGP observation point A
prefix p
same prefix across observation points
20Event
- Captures update propagation
- Clusters updates bursts across observation
points - Different timeout heuristics relative, static,
adaptive
updates for prefix A seen on peer P1
updates for prefix A seen on peer P2
Quiet period.
updates for prefix A seen on peer P3
event duration
time
21Prefixes
observation point B
BGP observation point A
prefix p
prefix q
correlate across prefixes
22High-level approach (II) Algorithms
- UNION heuristic (Time)
- INTERSECTION heuristic (Views)
- GREEDY heuristic (Prefixes)
23UNION heuristic
- Routing instability ? change from
previous to new path -
- previous best path no longer available
- or new best path becomes available
- ? UNION of AS edges as candidates
- Input BGP updates
- Output update bursts with candidates
24UNION heuristic
BGP observation point A
prefix p
UNION of AS edges on paths as candidates
25INTERSECTION heuristic
- Routing instability ? changes at multiple
observation points -
- ? INTERSECTION of candidate sets
- Input update bursts with candidates
- Output events with instability sets
26INTERSECTION heuristic
observation point B
BGP observation point A
prefix p
Changes observable at multiple observation points
27INTERSECTION heuristic
observation point B
BGP observation point A
prefix p
Changes observable at multiple observation points
28GREEDY heuristic
- Routing instability ? changes multiple
prefixes -
- ? identify correlated prefixes
- Input events with instability sets
- Output correlated events
29GREEDY heuristic
- Goal Distinguish between multiple simultaneous
instabilities. - Determine most popular AS edge in instability
sets - For all events and for each edge in instability
set - Counteredge
- Sort edges by counter values
- Chose edge with largest counter value as
candidate AS edge for associated events - Remove these events from the input
- Repeat
30Prefixes
observation point B
BGP observation point A
prefix p
prefix q
candidates for prefixes p
31Prefixes
observation point B
BGP observation point A
prefix p
prefix q
candidates for prefixes p q
32Outline
- High-level approach
- Limitations
- Evaluation
33Problems with UNION heuristic
- Location may not be in the UNION at all!? may
lead to empty INTERSECTION - Size of candidate set may be large
34Caution induced updates
AS8
AS7
p71
p 871
p 1
AS1
AS2
AS3
AS4
p 321
p 21
p 1
p
AS5
p 1
AS6
Policy AS 4 prefers path over AS 3 instead of AS
6!
35Caution induced updates
AS8
AS7
p71
p 871
p 1
AS1
AS2
AS3
AS4
p5871
p 321
p 1
preferred
p
AS5
p 4321
less preferred
p 1
AS6
Link failure between AS 2 AS 3
36Caution induced updates
AS8
AS7
p71
p 871
p 1
AS1
AS2
AS3
AS4
p5871
p 321
p 1
preferred
p
p5461
AS5
p 4321
less preferred
p 1
AS6
Old path 5871 new path 4561, but failure is
between 2-3
37Reducing size of candidate set
- Idea exclude some ASes
- e.g., initial or final shared path segment
- Narrows the candidate set, but may exclude the
location
prefix p
( exclude )
( exclude )
38Caution on excluding candidate ASes
AS2
p 1
p 21
AS4
AS5
AS1
p
p 1
p 31
AS3
39Caution on excluding candidate ASes
AS2
p 1
p 21
AS4
AS5
p 5421
AS1
p 421
change of preference!
p
p 431
p 5421
p 1
p 31
AS3
Path change AS 2 replaces AS 3 yet cause
is AS 5
40Good news
- Accurate in simulations
- Accurate when applied to real data
- Some formal justification in paper
41Outline
- High-level approach
- Limitations
- Evaluation
42Evaluation of methodology
- Simulation Setup
- Inferred AS topology from BGP data
- Routescope Simulator
- Data analysis Setup
- BGP routing table dumps and updatesfrom RIPE,
Routeviews, and Akamai - Over 1,100 BGP feeds / 650 ASes (some I-BGP)
43Simulation
- Topology
- Inferred AS topology
- Single node AS
- Policies
- Inferred AS relationships
- Prefer customer routes over peers over upstreams
- Predicted routes agree with high accuracy with
actual routes - Link failures
- Randomly selected
- Observation points
- Randomly selected
44Simulations UNION INTERSECTION
Histogram of instability set size Several UNION
heuristics
Percentages of events
Size of instability set ( of AS-AS edges)
Choice of heuristic matters
45Simulations summary
- The methodolgy never excludes the simulated
failure location - Number of observation points matter
- Average instability sizes after
intersectionwith only two obs. ?7 edges in
68with 10 obs. ?7 edges in 88 - Location of observation points matter (in
AS-hierachy)
46Data analysis UNIONINTERSECTION
Histogram of instability set size Several UNION
heuristics
Percentages of events
Size of instability set ( of AS-AS edges)
More aggressive heuristics are dangerous
47UNION INTERSECTION GREEDY
- Zipfs law seems to apply to the distribution of
correlated events across prefixes - Single AS edge identified for 93.4 of prefixes
- Three AS edges identified for 97.2 of prefixes
- If restricted to at least 100 correlated
prefixes - Single AS edge identified for 96.3 of prefixes
48Validation
- Syslog data of tier-1 vs. Greedy results
- Crosscheck Session reset on router ? event
within 5 minutes - Result
- Checked 35 events
- Found 26 events ? 74 of the events
49Summary
- Proposed methodologyTime ? Views ? Prefixes
- Ideal-world study Simulation
- UNION / INTERSECTION heuristics ? 7 AS edges
for 88 (10 obs.) - Real-world study Data analysis
- UNION / INTERSECTION heuristics
- Beacons ? 3 AS edges for 76 (2 obs.)
- All prefixes ? 5 AS edges for 90 (5 obs.)
- UNION / INTERSECTION / GREEDY heuristic
- All prefixes 1 AS edge for 93
- Successful validation on tier-1 syslog data
50Ongoing work
- Generate synthetic BGP traffic
- with Alexander Tudor (Agilent Labs)
- ease router testing by
- identifying a statistical profile of BGP
- BGP alarm system
- with Gert Doering (SpaceNet) and RIPE NCC
- detect unwanted routing conditions and trigger
alarms - integration of private AS monitors withRIPEs
myASN project
51Questions? Comments?!
Thanks !
52What did happen? -)
observation point B
AS7
CE PE
BGP observation point A
CE PE
AS4
prefix p
AS3
CE PE
peering
AS5
PE CE
AS6
AS8
PE CE
CE PE
prefix q
AS1
AS2
PE CE
PE CE
TIER 1 AS 1 / AS7 is doing TE on incoming routes
53What did happen? -)
observation point B
AS7
CE PE
BGP observation point A
CE PE
AS4
prefix p
AS3
CE PE
peering
AS5
PE CE
AS6
AS8
PE CE
CE PE
prefix q
AS1
AS2
PE CE
PE CE
TIER 1 AS 1 / AS7 is doing TE on incoming routes
54What did happen? -)
observation point B
AS7
CE PE
BGP observation point A
CE PE
AS4
prefix p
AS3
CE PE
peering
AS5
PE CE
PE CE
AS6
AS8
PE CE
CE PE
prefix q
AS1
AS2
PE CE
PE CE
AS 2 added a new upstream AS 3
55What did happen? -)
observation point B
AS7
CE PE
BGP observation point A
CE PE
AS4
prefix p
AS3
CE PE
peering
AS5
PE CE
PE CE
AS6
AS8
PE CE
CE PE
prefix q
AS1
AS2
PE CE
PE CE
AS 5 prefers the peering session / AS 7 the
shortest AS path
56Additional slides
a few thoughts about route flap dampening
57Interarrival time between echoes
peers without MRAI lots of echoes with MRAI
doesnt prevent echoes
58Number of echoes in update bursts
damping on peers? without MRAI 8.3 with MRAI
2.4
59Ciscos default damping parameters
60Summary
- Todays eBGP convergence depends on
- MRAI shorter MRAI leads to - more echoes and
to more damping and - to faster convergence if
damping is not aggressive - Damping settings - damping occurs for normal
prefixes! (BGP path exploration may need 6
echoes, and depends on interconnectivity)-
damping helps for unstable prefixes
61Additional slides
Convergence
62Regarding BGP convergence
- timeout too small cant capture all effects
- timeout too large combine several instabilities
in one burst
updates for prefix A seen on peer P1
updates for prefix A seen on peer P2
updates for prefix A seen on peer P3
time
instability
instability
63Regarding BGP convergence
- timeout too small cant capture all effects
- timeout too large combine several instabilities
in one burst
updates for prefix A seen on peer P1
updates for prefix A seen on peer P2
updates for prefix A seen on peer P3
time
failure
64Update burst duration
convergence can take rather long
65Number of updates in update bursts
most bursts only a few updates - some bursts
huge of updates!
66Interarrival time of update bursts
time to next update burst unpredictable
67Convergence points on different peers
Do all peers converge at the same time? - pick
one prefix on one peer - find other peers with
active update bursts - compute time difference
between convergence points
updates for prefix A seen on peer P1
updates for prefix A seen on peer P2
updates for prefix A seen on peer P3
time
68Time difference between convergence points
5 of prefixes with more/less specific update
burst
69Bursts observed on different peers
update distribution locally or globally visible
70Additional slides
BGP wedgies
71Tim Griffin BGP wedgies
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
ISP C
Customer E
72Tim Griffin BGP wedgies
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
primary link
ISP C
backup link
Customer E
73Desired Situation !!!
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
primary link
ISP C
backup link
Customer E
74AS path prepending ???
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
2
ISP C
2 2 2 2 2
AS 2
75Policies with communities ?!!
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
primary link
ISP C
Community set local-preference
AS 2
76Primary link fails
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
ISP C
AS 2
77Primary link recovers ?!!
ISP A (Tier 1)
ISP B (Tier 1)
ISP D (Tier 2)
ISP C
AS 2