Title: Interview talk at various universities and labs
1R-BGP Staying Connected in a Connected World
Nate Kushman Srikanth Kandula, Dina Katabi, and
Bruce Maggs
2BGP Convergence Causes Packet Loss
The Problem
- When a route changes, up to 30 packet loss for
more than 2 minutes Labovitz00 - Even domains dual homed to tier 1 providers see
many loss bursts on a route change Wang06 - Even popular prefixes experience losses due to
BGP convergence Wang05 - 50 of VoIP disruptions are highly correlated
with BGP updates Kushman06
3Links, Links Everywhere But Not a Path to
Forward!
- Goal
- Ensure ASes stay connected as long as the
physical network is connected
4 We Focus on Forwarding
- Dont worry about BGPs routing
- Ensure forwarding works by forwarding packets on
pre-computed failover paths
5Why Focus on Forwarding?
- Convergence is unlikely to be fast enough
- Strict timing constraints limit innovation
6Our Contribution
Guarantee
No BGP caused packet loss
Low Overhead
Just like BGP, each AS advertises at most one
path to each neighbor
On link failure, we reduce disconnected ASes from
22 to Zero
7What Causes Transient Disconnection?
ATT
Sprint
Peter
All of Haris providers use him to get to MIT
BGP Rule An AS advertises only its current
forwarding path
Hari
? Nobody offers Hari an alternate path
MIT
8What Causes Transient Disconnection?
ATT
Sprint
Peter
Hari knows no path to MIT
Hari drops Peter and ATTs packets in addition
to his own
Hari
LOSS!
X
Link Down
MIT
9What Causes Transient Disconnection?
Hari withdraws path
ATT
Sprint
Peter
ATT and Peter move to alternate paths
Hari
X
MIT
10What Causes Transient Disconnection?
Hari withdraws path
ATT
Sprint
Peter
ATT and Peter move to alternate paths
ATT announces the Sprint path to Hari ?
Traffic flows
Hari
X
Transient Packet Loss
MIT
11How do failover paths solve the problem?
BGP
An AS advertises only its current path. It
advertises an alternate only after a link fails
R-BGP
Advertises an alternate, i.e. failover path,
before a link fails
12Failover Paths
ATT advertises to Hari ATT? Sprint ? MIT as a
failover path
Peter
ATT
Sprint
Link Fails ? Hari immediately sends traffic on
failover path
Hari
No Loss !
X
MIT
13Two Challenges
Minimize the number of failover paths, while
ensuring an AS always has a usable path
Challenge 2
Transition from usable path to converged path
without creating forwarding loops
14Challenge 1 Minimize number of failover paths
Claim Just like BGP, advertise one path per
neighbor, either current or failover
Current path
Current path
ATT
Peter
Sprint
Current path
Failover Path
Hari
Insight Replace path advertised to
downstream AS with a failover path
MIT
15Which failover path should it advertise?
ATT
John
x
Bob
Joe
Most Disjoint Path
Dest
Lemma Advertising Most Disjoint is equivalent to
advertising all paths.
16Challenge 1 Minimize number of failover paths
R-BGP Rule
Advertise to downstream AS as a failover path the
path most disjoint from the current path
When a link fails Theorem 1 The AS upstream
of down link knows a failover path if it will
know a path at convergence
17Challenge 2 Transition without loops
ATT
Sprint
Hari withdraws path
Peter
Hari
X
MIT
18Challenge 2 Transition without loops
LOOP!
ATT
Sprint
Hari withdraws path
Peter
Peter may choose to route through ATT
ATT may choose to route through Peter
Hari
X
Forwarding Loop!
MIT
19Challenge 2 Transition without loops
Solution 2 Root Cause Information
Hari includes Root Cause Information with the
withdrawal
ATT
Sprint
Peter
ATT recognizes the Peter-gtHari-gtMIT path is down
Hari-gtMIT
Hari-gtMIT Link down
It routes through Sprint instead
Hari
X
Theorem 2 No forwarding loops will form
MIT
20R-BGP
- Solution 1 Advertise most disjoint path to
downstream AS
Solution 2 Include Root Cause Information
Final Theorem No AS will see BGP caused packet
loss if it will have a path at convergence
21Experimental Results
22Setup
- AS-Level Simulation over the full Internet
- AS-graph with 24,142 ASes from Routeviews BGP
Data - Use inference algorithm to annotate links with
customer-provider or peer relationships
23Single Link Failure Results
- Dual-homed AS loses one link
- Find percentage of ASs that see transient
disconnection to the destination - Run for all dual homed ASes
X
Destination
24Single Link Failure Results
Percentage of ASes transiently disconnected
22 - BGP
Zero - R-BGP
R-BGP Eliminates all Transient Disconnection
25Cost of Policy Compliance
- Most disjoint path may not be compliant with BGP
routing policies - Still an AS may want to advertise it
- To protect its own traffic
- Because it is temporary
What if we choose most-disjoint among policy
compliant paths?
26Cost of Policy Compliance
Percentage of ASes transiently disconnected
22 - BGP
Zero - R-BGP
27Cost of Policy Compliance
Percentage of ASes transiently disconnected
22 - BGP
1.4 - R-BGP policy compliant
Zero - R-BGP
Policy compliant failover paths may be sufficient
28Multiple Link Failure Results
- All proofs are for single link failure
- Randomly choose a second link
X
Destination
29Multiple Link Failure Results
Percentage of ASes transiently disconnected
22 - BGP
1.4 - R-BGP policy compliant
0 - R-BGP
Multiple link failures are unlikely to interact
30Worst Case Scenario
- Fail link on current path
- Fail link on corresponding failover path
X
Hari
X
Destination
31Multiple Link Failure Results
Percentage of ASes transiently disconnected
33 - BGP
32Multiple Link Failure Results
Percentage of ASes transiently disconnected
33 - BGP
12 - R-BGP policy compliant
33Worst case Scenario
Percentage of ASes transiently disconnected
33 - BGP
12 - R-BGP policy compliant
7 - R-BGP
Eliminates 80 of disconnection even in the worst
case of link failures on both current and failover
34Conclusion
- BGP loses connectivity even when the physical
network is connected - R-BGP uses a few failover paths to ensure
forwarding works throughout convergence - Guarantees no packet loss
- Just like BGP, one path per neighbor
- Reduces disconnected ASes from 22 to zero
Working with Cisco on prototype feasibility
35The End
36Multiple Link Failure Results
- Joe forwards on second best path, not most
disjoint
Joe
X
- Packets on Bobs failover path follow Joes
second best path to the destination
Bob
X
Destination
37Practical
- Requires only a few modifications to BGP
- Currently working with Cisco to prototype
- Advertises only one path per neighbor, just like
BGP - Convergence time 1/3 that of BGP
38Challenge 1 A few Strategic Failover Paths
Solution 1 Most Disjoint Path
Theorem 1 If any AS using the down link will
have a path after convergence, then R-BGP
guarantees that the AS immediately above the down
link knows a failover path when the link fails.
39Implementing Failover Paths Three Rules
- Routing Rule Each router advertises only one
failover path and only to the next hop router on
its primary path
U-Turn Rule The router immediately upstream of
the down link sends all packets destined for the
down link on the failover virtual interface for
the failover path
Forwarding Rule When routers receive packets
along a failover virtual interface, they forward
them along the failover path
40No Available Loop Free Path
Hari-gtMIT Link is down
Hari-gtMIT Link is down
ATT can immediately move to Sprint path
ATT
Sprint
Peter
Peter is left without any usable path
Peter continues to use the old path
Hari
Moves away from old path only after receiving
advertisement from ATT
Mechanism 3 If no path without the down link is
available, continue to use the old path until
such a path becomes available or sure that no
such path will become available.
MIT
41Putting it all together
42 Final Theorem When a link fails If an AS
will eventually have a path, it will see no BGP
caused packet loss
43 Final Theorem When a single link fails, all
ASs that will eventually learn a valley-free path
to the destination are guaranteed no BGP-caused
packet loss during convergence
A path is valley-free if no AS transits between
two non-customers ASs
44Little Additional Overhead
22K
20K
Less than 10 more updates network wide
45Faster Convergence Times
13
4
Convergence times are 1/3 of those with BGP
46Compared Schemes
- Current BGP
- Most-disjoint failover path
- Most-disjoint policy-compliant failover path
47Goal Staying Connected
- If an ASes link to destination fails
- and
- After convergence the AS will have a path to
destination
X
The AS should know a failover path to the
destination when the link fails
Destination
48Goal Staying Connected
- the AS immediately upstream of a down link can
protect all traffic - Without a failover path, all ASes see
disconnection
X
Destination
The AS upstream of the down link must know a
failover path when the link fails
49Goal Staying Connected
- AS immediately upstream of a down link can
protect all traffic
If this AS has no failover path, all ASes using
link see disconnection
X
The AS upstream of the down link must know a
failover path when the link fails
Destination
50Challenge 2 Consistency during convergence
Routing Loops ASes unaware of available paths
Inconsistency across ASes
Strong Consistency
Expensive
Balance between providing enough consistency
while maintaining BGPs scalability
51Challenge 1 Which Failover Paths to Advertise
- AS immediately upstream of a down link can
protect all traffic
LOSS!
If this AS has no failover path, all ASes using
link see disconnection
X
The AS upstream of the down link must know a
failover path when the link fails
Destination
52Division of Labor
- If AS upstream of down link doesnt
know failover path everyone sees loss - If the AS knows a failover path no one see
loss - Each AS responsible for immediately downstream
link
X
Which path does the AS far upstream offer to
which neighbors?
Destination
53Impossible is nothing
ATT
Sprint
- If AS above down link doesnt know path everyone
sees loss
Peter
- If he knows a path no one sees loss
Hari
- Assign each AS responsibility for downstream link
MIT
- The real question is which path upstream guy
offers
54Impossible is nothing
ATT
Sprint
- If AS above down link doesnt know path everyone
sees loss
Peter
- If he knows a path no one sees loss
Hari
- Assign each AS responsibility for downstream link
MIT
- The real question is which path upstream guy
offers
55immediately upstream must know, waaayyy upstream
must advertise
Assigning responsibility
- If AS above down link doesnt know path everyone
sees loss - If the guy knows a path youre fine
- Assign responsibility to that guy
- The real question is which path upstream guy
offers
56The Challenges
- Challenge 1 Which Failover Paths to Advertise
Ensure continuous connectivity without flooding
the network with failover paths
Challenge 2 Consistency During Convergence
A large scale distributed consistency problem
leaves ASes with loops and path loss
57Challenge 1 Which Failover Paths to Advertise
- Can we do this while advertising only one path
per neighbor just like BGP? - Any path currently advertised to the next-hop
neighbor is useless
Constraint An AS advertises only one failover
path, and only to its next-hop neighbor
58Challenge 1 Which Failover Paths to Advertise
X
Destination
59Challenge 1 Which Failover Paths to Advertise
- AS immediately upstream of a down link can
protect all traffic
LOSS!
If this AS has no failover path, all ASes using
link see disconnection
X
The AS upstream of the down link must know a
failover path when the link fails
Destination
60Challenge 1 Which Failover Paths to Advertise
Solution 1 Most Disjoint Paths Each AS
advertises to its next-hop AS a failover path
which is the path most disjoint from its primary
Theorem 1 When a link fails and there is some
path The AS immediately upstream of the
down link knows a failover path
61Challenge 2 Inconsistency During Convergence
Hari withdraws path from ATT and Peter
ATT
Sprint
Peter
ATT and Peter stop sending packets to Hari
Hari
MIT
62Challenge 2 Inconsistency During Convergence
Hari withdraws path from ATT and Peter
LOSS!
ATT
Sprint
Peter
ATT and Peter stop sending packets to Hari
Peter will choose to route through ATT
Hari
ATT may choose to route through Peter
MIT
Routing Loop Created!
63Challenge 2 Inconsistency During Convergence
Solution 2 Root Cause Information
ATT
Sprint
Hari includes Root Cause Information with the
withdrawl
Peter
Hari-gtMIT
Hari-gtMIT Link down
ATT recognizes the Peter-gtHari-gtMIT path is no
longer available
Hari
It routes through Sprint instead
MIT
Routing Loop Avoided!
64Challenge 2 Inconsistency During Convergence
Solution 2 Root Cause Information
- Include in each update Root Cause Information
indicating the down link - Do not use paths that include the down link
Theorem 2 When a link fails If an AS will
eventually have a path, it will see no BGP
caused packet loss
65How do failover paths solve the problem?
- BGP often provides an alternate path only after
the link fails - R-BGP uses pre-computed failover paths to ensure
all ASes have an alternate path before the link
fails
66Single Link Failure Results
Percentage of ASes transiently disconnected
22 - BGP
Zero - R-BGP
67Advertise failover path to which neighbor?
BGP Rule Advertise only best path (used path)
- Advertised Path always contains downstream AS
BGP Rule Do not use paths with your AS
Insight Any path advertised to the downstream
neighbor cant be used by that neighbor
68Multiple Link Failure Results
Percentage of ASes transiently disconnected
33 - BGP
69Multiple Link Failure Results
Percentage of ASes transiently disconnected
33 - BGP
12 - R-BGP policy compliant
70Multiple Link Failure Results
Percentage of ASes transiently disconnected
33 - BGP
12 - R-BGP policy compliant
7 - R-BGP
Eliminates 80 of disconnectivity even in the
worst case of link failures on both primary and
failover
71Multiple Link Failure Results
Percentage of ASes transiently disconnected
33 - BGP
12 - R-BGP policy compliant
7 - R-BGP
Eliminates 80 of disconnectivity even in the
worst case of link failures on both primary and
failover
72Challenge 2 Inconsistency During Convergence
Solution 2 Root Cause Information
- Include in each update Root Cause Information
indicating the down link - Do not use paths that include the down link
Theorem 2 When a link fails No loops will
form