Title: Internet Routing Instability
1Internet Routing Instability
Craig Labovitz G. Robert Malan Farnam Jahanian
Appeared SIGCOMM 97
Presenters Supranamaya Ranjan Mohammed
Ahamed
2Internet Structure
- Many small ISPs
- at lowest level
- Small number of
- big ISPs at core
3The Core of the Internet
Sprint
Verio
UUNet
rice.edu
- Routing done using BGP at core
- Inter-domain routing could be RIP/OSPF etc
4BGP Overview
92.92.x.x
Sprint
128.42.x.x 196.29.x.x
92.92.x.x
Verio
100.100.x.x 196.29.x.x
UUNet
128.42.x.x
196.29.x.x
100.100.x.x
5BGP Overview (contd.)
- Similar to Distance Vector routing
- Loop detection done using AS_PATH field
R1
R2
Peering session (TCP)
- Exchange full routing table at start
- Updates sent incrementally
6Key Point
The volume of BGP messages exchanged is
abnormally high
- Most messages are redundant / unnecessary and do
not - correspond to and topology or policy
changes
7Consequence Instability
- Normal data packets handled by dedicated hardware
- BGP packet processing consumes CPU time
- Severe CPU processing overhead takes the router
offline
Route Flap Storm
B
- Router A temporarily fails
A
- When A becomes alive B C
- send full routing tables
C
How do we avoid /lessen the impact of these
problems?
8Route Dampening
- Router does not accept frequent route updates to
a - destination
- Might signal that network has erratic
connectivity
- Increment counter for destination when route
changes
- Counter exceeds threshold stop accepting updates
- Decrement counter with time
Problem
- Future legitimate announcements are accepted
only - after a delay
9Prefix Aggregation/Super-netting
- Core router advertises a less specific network
prefix
- Reduces size of routing tables exchanged
Problems
Prefix aggregation is not effective because
- Internet addresses largely non-hierarchically
assigned
- Domain renumbering not done when changing ISPs
- 25 of prefixes multi-homed
- Multi-homed prefixes should be exposed at the
core
10Route Servers
- O(N) peering sessions per
- Router
- 1 peering session per router
Route Server
In-spite of all these measures the BGP message
overhead is unexpectedly high
11Evaluation Methodology
- Data from Route Server at M.A.E west (D.C)
peering point
- Peering point for more than 60 major ISPs
- Time series analysis of message exchange events
12Observation Lots of redundant updates
- Duplicate route with-drawls
Number of With-drawls
Unique
ISP
Ratio
A
23276
4344
5
F
86417
12435
7
I
2479023
14112
175
One Reason - Stateless BGP - No state
of previous with-drawls maintained
13Observation Instability Proportional to Activity
After removing duplicate messages
Time of day
14Evidence from Fine Grained Structure
7 days
24 hours
Power spectral density
Frequency (1/hour)
Conjecture BGP packets are competing with
data packets during high bandwidth activity.
15Observation Instability size uncorrelated
- ISPs serving more network prefixes
- may not contribute more to instability
16Observation Instability distributed over routes
75 median
Cumulative proportion
10
of announcements per prefixAS
- 20 to 90 of routes change 10 times or less
- No single route contributes significantly to
instability
17Observation Synchronized updates
- Inter-arrival times of
- updates shows periodicity
- 30 s and 1 minute patterns
- Some routers collect and send
- Updates once every 30 s
Possible reasons
- Border router- Internal router interaction
misconfigured??
18End-to-end Perspective
Chinoy Dynamics of Internet routing
information (SIGCOMM 93)
Measurements on NSFNET showed - Processing
and forwarding latency of BDP update is 3
orders of magnitude more than the latency
incurred in forwarding data packets -
Will lead to packet drops during the intervening
period
Paxson End-to-End routing behavior in the
internet (SIGCOMM 96)
- Routing loops introduce loops into other
routers routing tables - An end-to-end route changes every 1.5 hours on
an average
19End-to-End perspective (Paxson)
Pathology type
Probability in 1995
Probability in 1996
same
Long-lived Routing loops
Short-lived Routing loops
same
Outagegt30s
0.96
2.2
Total
3.4
1.5
20Summary and Conclusions
- Redundant routing information flows in core
- Instability distributed across autonomous systems
Possible reasons for instability
- Stateless BGP updates
- Misconfigured routers
- Synchronization
- Clocks driving the links not synchronized (link
flaps)
21Follow-up work impact
Origins of Internet Routing Instability-1999
- Migration from stateless to stateful BGP
decreased duplicate withdrawals - by an order of magnitude
- But Duplicate Announcements (AADup) doubled
- Reason Non-transitive attribute filtering not
implemented
- BGP specification never
propagate non-transitive attributes..
- ASPATH is transitive attribute
- MED (Multi Exit Discriminator)
is NOT transitive
22Propagating MEDs Causes Oscillations