Title: Advanced Networks
1Advanced Networks
- 1. Delayed Internet Routing Convergence
- 2. The Impact of Internet Policy and Topology on
Delayed Routing Convergence
2The Problem
- How to Recover from Failure Quickly?
- Phone systems recover, failover, in milliseconds
- Internet takes an order of minutes
- Loss of Connectivity
- Packet Loss
- Latency
3The Problem (cont)
- Failure over on the internet not very good
- Sluggish Backup systems
- Internet has to adjust to the failure
- Path must be restored to back up
4The Questions
- Why does convergence take so long?
- What is the upper bound for convergence?
- What causes this delayed convergence?
- What can we do about it?
5Theory
- Unexpected Interaction of
- Protocol timers
- Router Implementation
- Policies (Safe/Unsafe)
6Theory (cont)
- Distance vector algorithm has issues
- Lack of sufficient info to determine if next hop
choice will cause loops
7Convergence Accelerators
- Use of Path Vector
- Split Horizon
- Triggered updates
- Diffusion
- Timers
8Policies
- Admins can implement unsafe policies
- Policies can cause route oscillations
- Routers default to Shortest Path
- Even if constrained upper-bound might be as high
factorial
9Point of Paper
- Measure the convergence behavior of BGP 4
- Done for Bellman-Ford O(n3)
- Convergence in BGP is NOT much better than RIP
- Give an upper and lower bounds to convergence
10The Work Done
- 2 year study
- 250,000 routing fault injections
- 25 Internet providers
- End to End performance measurements
11Terminology
- Tup (New) Route Announcement
- Tdown Route Withdrawal
- Tshort Shorter Route Replaces Current
- Current Route is Withdrawn Implicitly
- Tlong Shorter Route Replaced with longer one
- Represents a failure and failover
- Current Route is Withdrawn Implicitly
12Latency
13Latency (cont)
- Oscillation greater than 3 minutes
- 20 of Tlong
- 40 of Tdown
- Equivalence Latency Classes
- Tlong,Tdown
- Tshort,Tup
14Latency per ISP
15BGP Update Volume
- Average Message Per Event Type
Tup Route Announcement Tdown Route
Withdrawal Tshort Shorter Route
Replacement Tlong Longer Route Replacement
16Questions
- Why do Tlong and Tdown cause 2 times the amout of
updates? - Why do certain ISP produce more updates per
event? - Relationship between number of updates and
convergence latency?
17Questions (cont)
- What makes an ISP have a higher latency?
- Interesting Points
- ISP3 Japans National Backbone
- ISP5 Canadian ISP
- Latency NOT Dependant Geographic Distance or
Network Distance (aka hop count)
18Graph Analysis
- No relationship between day of the week and
Latency! - Independent of Network load and congestion
19End to End Measurements
- Route Oscillation effects performance
- Drop Packets, Buffering of Packets
- Out of order delivery
20Failover from end to end view
- Time after ICMP echo arrived after Tup
- Simulates a failover
- 80 of test sites began returning after 30
seconds - 100 after one minute
21BGP Convergence Model
- IBGP ignored
- Full Mesh
- Ignore ingress and egress filters
- Exclude MinRouteAdver
- Updates messages follow FIFO ordering
22BGP Convergence Example
- Start 0(R, 1R, 2R) 1(0R, R, 2R) 2(0R, 1R, R)
R Withdraws routes R -gt 0 W R -gt 1 W R -gt 2 W
23BGP Convergence Example
0(-, 1R, 2R) 1(0R, -, 2R) 2(0R, 1R, -)
1 and 2 receive new announcement from 0 0 -gt 1
01R (loop) 0 -gt 2 01R
0(-, 1R, 2R) 1(-, -, 2R) 2(01R, 1R, -)
0 and 2 receive new announcement from 1 1 -gt 0
10R (loop) 1 -gt 2 10R
- 0(-, -, 2R) 1(-, -, 2R) 2(01R, 10R, -)
24BGP Convergence Example
0 and 1 receive new announcement from 2 2 -gt 0
20R 2 -gt 1 20R
0(-, -, -) 1(-, -, 20R) 2(01R, 10R, -)
0 and 2 receive new announcement from 1 1 -gt 0
12R 1 -gt 2 12R
0(-, 12R, -) 1(-, -, 20R) 2(01R, -, -) 48
steps later 0(-, -, -) 1(-, -, -) 2(-, -, -)
25Upper Bound
- For n nodes there exist 0((n-1)!) distinct paths
- When a route is withdrawn, a new route is found
of equal or increasing length - Message count could be a bad as
(n-1)O((n-1)!) until convergence - Not really possible on the internet
26Lower Bound
- Made possible by MinRouteAdver timers
- (n-1) Rounds to convergence
27MinRouteAdver
- Minimum time between route advertisements
- Gives a AS time to pick a good route before
announcing it - In standard BGP, timer only applied to
announcements - Does Not apply to explicit withdrawls
28Example Reloaded
- Instead of 48 rounds only took 13 rounds
29Example Reloaded
30Question Reloaded
- Why do Tup/Tshort converge quicker than
Tdown/Tlong? - Answer Tup/Tshort are decreasing while
Tdown/Tlong are increasing - One a path is selected a longer one will not be
picked - While on Tdown/Tlong you pick the next best one
until you are out of choices - O(1) for Tup while O(n) for Tdown
31Question Reloaded
- Why is there different latencies between the five
ISPs? - Answer The topological factors, length and
number of possible paths (peering relationships,
policies and agreements) are the answer. - Longer routes announced, longer latencies
- Longer routes the more MinRouteAdver rounds
32Loop Detection
- Loop Detection done at receiver side
- If done, at sender you can get more out of
MinRouteAdver round - MinRouteAdver is good but causes a 30 second
delay in end to end communication at best
33Convergence Delay Due to Policies and Topology
- 2nd study of convergence
- 20 unique advertisement between 200 pairs of
ISPs, 6 months - Measure the impact of Policies
- Measure the impact of Topology
- Analysis
34Multi-home Networks
- One network, two ISPs
- Better connectivity backup
- Failover New route convergence
- Work done in this Paper
- Convergence Analysis of Tdown event
35Work Done
- Fault injection announcements
- Logged table snapshot to disk
- Survey of backbone providers
- Routing and peering policies
- Used data to discuss impact on convergence
36Policy
- How policy impacts number and length of ASPaths
with a given route - Limited inbound acceptance by all ISP
37Inbound Filtering Example
- ISP D filters peering session with ISPG
- D only accept Gs backbone and customers routes
- ISP A filters peering session with D
- A only accept Ds backbone and customers routes
- ISP A will accepts Gs routes by chaining
38Outbound Filters
- A will advertise routes with paths D G and D
but not C D G - Done by 13 of ISPs
- Combinations of ASPath and prefix filters create
unintentional back-up transit paths
39Topological Effect
- Interaction of MinRouteAdver timers
- MinRouteAdver is per peer not prefix
- MinRouteAdver interference delays convergence
40Backup Path Selection
41Convergence Latency
42Convergence Latency (cont)
- ISP1 explored one backup path of length 2
- ISP2 explored backup paths of length 2 and 3
- ISP 3 explored backup paths of length 5
43Convergence Latency (cont)
44Convergence Latency (cont)