Title: Hot Potatoes Heat Up BGP Routing
1Hot Potatoes Heat Up BGP Routing
- Jennifer Rexford
- ATT LabsResearch
- http//www.research.att.com/jrex
- Joint work with Renata Teixeira, Aman Shaikh,
and Timothy Griffin
2Outline
- Internet routing
- Interdomain and intradomain routing
- Coupling due to hot-potato routing
- Measuring hot-potato routing
- Measuring the two routing protocols
- Correlating the two data streams
- Performance evaluation
- Characterization on ATTs network
- Implications on network practices
- Conclusions and future directions
3Autonomous Systems
4
3
5
2
6
7
1
Web server
Client
AS path 6, 5, 4, 3, 2, 1
4Interdomain Routing (BGP)
- Border Gateway Protocol (BGP)
- IP prefix block of destination IP addresses
- AS path sequence of ASes along the path
- Policy configuration by the operator
- Path selection which of the paths to use?
- Path export which neighbors to tell?
5Intradomain Routing (IGP)
- Interior Gateway Protocol (OSPF and IS-IS)
- Shortest path routing based on link weights
- Routers flood link-state information to each
other - Routers compute next hop to reach other routers
- Weights configuration by the operator
- Simple heuristics link capacity or physical
distance - Traffic engineering tuning link weights to the
traffic
2
1
3
1
3
2
1
5
4
3
6Two-Level Internet Routing
- Hierarchical routing
- Intra-domain
- Metric based
- Inter-domain
- Reachability and policy
- Design principles
- Scalability
- Isolation
- Simplicity of reasoning
Autonomous system (AS) network with unified
administrative routing policy (ex. ATT,
Sprint, UCSD)
7Motivation
Routes to thousands of destinations switch exit
point!!!
Z
X
ISP network
ISP network
Y
8BGP Decision Process
- Ignore if exit point unreachable
- Highest local preference
- Lowest AS path length
- Lowest origin type
- Lowest MED (with same next hop AS)
- Lowest IGP cost to next hop
- Lowest router ID of BGP speaker
Hot potato
9Outline
- Internet routing
- Interdomain and intradomain routing
- Coupling due to hot-potato routing
- Measuring hot-potato routing
- Measuring the two routing protocols
- Correlating the two data streams
- Performance evaluation
- Characterization on ATTs network
- Implications on network practices
- Conclusions and future directions
10Why is This So Darn Hard?
- Noisy signals
- Single event can cause multiple IGP messages
- Large amount of background BGP updates
- Multiple messages for single BGP routing change
- Protocol implementation
- Routing protocols provide limited information
- High complexity of BGP due to configurable
policies - Many vendor-specific details, such as timers
- Monitoring limitations
- Cannot collect data from every vantage point
- Delays in delivering data to the collection
machine - Time synchronization across multiple collectors
11Our Approach
- Collect measurement of both protocols
- BGP monitor and OSPF monitor
- Correlate the two streams of data
- Match BGP updates with OSPF events
- Analyze the protocol interaction
Z
OSPF messages BGP updates
ATT backbone
X
M
Y
12Challenges
- Lack of information on routing messages
- Routing protocols are designed to determine a
path between two hosts, but not to give reason
Example 1 BGP update caused by OSPF
Z
dst, Y
dst, X
8
X
M
BGP announcement dst, X OSPF CHG X, 8
9
Y
dst
13Challenges
- Lack of information on routing messages
- Routing protocols are designed to determine a
path between two hosts, but not to give reason
Example 2 BGP update NOT caused by OSPF
Z
dst, Y
dst, X
10
X
M
BGP announcement dst, X
9
Y
dst
14Challenges
- Lack of information on routing messages
- Routing protocols are designed to determine a
path between two hosts, but not to give reason
Example 2 BGP update NOT caused by OSPF
Z
dst, Y
dst, X
10
8
X
M
BGP announcement dst, X OSPF CHG X, 8
9
Y
dst
15Heuristic for Matching
Stream of OSPF messages
Transform stream of OSPF messages into routing
changes
Match BGP updates with OSPF events that happen
close in time
time
Classify BGP updates by possible OSPF causes
Stream of BGP updates
16Pre-processing OSPF LSAs
- Transform OSPF messages into routing changes from
a routers perspective
OSPF routing changes
Z
1
1
2
1
X
M
2
10
2
10
2
LSA weight change, 10
LSA weight change, 10
LSA delete
LSA add, 1
1
Y
17Classifying BGP Updates
BGP update from Z
Withdrawal of dst, Y
Announcement of dst, X
ADD X?
DEL Y?
Replacement of route to dst
different route through Y
DEL Y?
M
ADD X?
Z
X
CHG X or CHG Y?
dst
Y
18Classifying BGP Updates
route via X is worse
route via X is better
DEL Y?
ADD X?
routes are equally good
DEL Y?
ADD X?
CHG X, CHG Y?
M
Z
X
dst
Y
19Outline
- Internet routing
- Interdomain and intradomain routing
- Coupling due to hot-potato routing
- Measuring hot-potato routing
- Measuring the two routing protocols
- Correlating the two data streams
- Performance evaluation
- Characterization on ATTs network
- Implications on network practices
- Conclusions and future directions
20Time Lag
Cumulative BGP updates
OSPF-triggered BGP updates for June 25th, 2003
time BGP time OSPF (seconds)
21Results for June 2003
- High variability according to location and day
- Impact on external BGP measurements and
customers - One LSA can have a big impact
location min max days gt 10
close to peers 0 3.76 0
between peers 0 25.87 5
location no impact prefixes impacted
close to peers 97.53 less than 1
between peers 97.17 55
22BGP Updates Over Prefixes
Cumulative BGP updates
Non-OSPF triggered All OSPF-triggered
prefixes
23Operational Implications
- Forwarding plane convergence
- Accuracy of active measurements
- Router proximity to exit points
- Likelihood of hot-potato routing changes
- Cost in/out of links during maintenance
- Avoid triggering BGP routing changes
- More complexity with route reflectors
- Longer delays and more BGP messages
24Forwarding Convergence
R1s scan process can take up to 60 seconds to
run
Scan process runs in R2
R2 starts using R1 to reach dst
10
R2
R1
111
10
100
dst
25Measurement Accuracy
- Measurements of customer experience
- Probe machines have just one exit point!
loop to reach dst
10
R2
R1
111
100
dst
26What to do?
- Increase estimate for forwarding convergence
- For destinations/customers with multiple exit
points - Extensions to measurement infrastructure
- Multiple access links for a probe machine
- Multiple probe machines with same address
- Better BGP implementation on the router
- Decrease scan timer (maybe too much overhead?)
- Event-driven IGP/BGP implementation
27Avoid Equal-distance Exits
dst
dst
28Careful Cost in/out Links
Z
5
100
5
X
10
10
4
10
Y
Traffic is more predictable Faster
convergence Less impact on neighbors
29iBGP Route Reflectors
X
Z
9
10
Y
8
11
20
W
Scalability trade-off Less BGP state vs.
Number of BGP updates from Z and longer
convergence delay
30Ongoing Work
- Reduction of false matches
- Compare with conservative analysis (lower bound)
- Cluster BGP updates and IGP LSAs in time
- Black-box testing of the routers
- Scan timer and its effects (forwarding loops)
- Vendor interactions (with Cisco)
- Impact of the IGP-triggered BGP updates
- Changes in the flow of traffic
- Externally visible BGP routing changes
- Modeling the protocol interaction
- Understand impact of router location
31Future Directions
- Improving isolation (cooling those potatoes!)
- Operational practices preventing interactions
- Protocol design stronger decoupling
- Network design internal topology/architecture
- Extending our monitoring architecture
- Data from multiple vantage points
- Real time correlation of data streams
- Automatic generation of alarms/reports
- Better route monitoring
- Router support for special monitoring sessions
- Protocol extensions to help in troubleshooting
- Diagnose problems, and read the routers mind!
32Exporting Routing Instability
Z
X
dst
Y
No change gt no announcement
dst