Title: Detection of Routing Loops and Analysis of Its Causes
1Detection of Routing Loops and Analysis of Its
Causes
- Sue Moon
- Dept. of Computer Science
- KAIST
- Joint work with Urs Hengartner, Ashwin Sridharan,
- Richard Mortier, Christophe Diot
2Link Utilization
Internet backbone link
3Overview
- Routing protocols have much impact on the
performance of the network - How do we detect them?
- How often do loops occur?
- How do they impact loss and delay?
- Analyze causes of loops
- What causes them?
4Possible Causes of Routing Loops
- Persistent routing loops
- E.g., due to misconfiguration.
- Loops can last hours if undetected.
- Transient routing loops
- Routing state is dynamic.
- Inconsistencies in routing state can cause loops.
- Inconsistencies should disappear within
seconds/minutes. - Expectation Loops last seconds/minutes.
5How Can Transient Routing Loop Occur?
R2
R1
R3
6Detection of Loops in Packet Traces
- Detect replicas in a packet trace
- Packets with exact same header but for TTL,CRC
- TTL difference 2 or larger
- Set of replicas Packet Loop
- Set of packet loops associated with a routing
event Routing Loop
7Traces
- Backbone traces
- NYC and SJ links from Nov. 8th, 2001
- NYC links from Oct. 9th, 2002
8Packet Traces
Trace Length Avg BW Packets Looped
(hours) (Mbps) Total (106) Packets
Backbone 1 24 1 50 4.839
Backbone 2 7.5 243 1 677 0.118
Backbone 3 11 2.2 20 1.687
Backbone 4 11 107 1350 0.026
On average, loops do not affect much traffic, but
9Observations about Packet Loops
- General Observations
- Loop size of nodes involved in packet loop
- Number of replicas in packet loop
- Properties of packet loops
- Packet types
- Duration
- Of packet loops in packets
10Loop Size
- Loop size value by which TTL field in packet
loops gets decremented.
Figure 2
11Packet Loop Length
How often does a packet show up before it expires?
Figure 3
12Traffic Types
- Different types of Internet traffic.
- Routers are oblivious to type of traffic.
- Expectation Traffic types of packet loops
streams are distributed similarly as traffic
types of overall traffic.
13Traffic Types (Backbone 2)
- By protocol
- TCP 10 (93)
- UDP 16 (6)
- ICMP 77 (0.3)
- TCP Flags
- SYN 51 (5)
- ACK 73 (97)
- RST 13 (1.5)
- FIN 8 (4)
14Reasons for Increases
- TCP SYN traffic.
- TCP is connection oriented.
- End point tries to open connection, sends SYN
packet. - SYN packet loops and expires, no other packets
are sent. - UDP traffic.
- UDP is connectionless, no feedback from receiver.
- Sending application is oblivious of loop.
- ICMP traffic.
- Caused by traceroute/ping applications.
- People are exploring loop.
15Out-Of-Order Delivery
16Causes of Packet Loops BGP
customer
AS 2
C
AS 1
A
B
D
17Matching BGP Updates
- Any advertisement of the longest prefix?
- Temporal vicinity of 2 minutes to packet loops?
- Change in next hop or AS path?
18Causes of Loops ISIS
R1
1
1
1
R3
R2
1
1
4
R5
R4
19Time-Line at Nodes R2 and R3
R2
R3
Failure Detection
LSP generation
Shortest Path Computation
LSP Flooding
FIB Update
LSP Arrival
Shortest Path Computation
FIB Update
20Matching ISIS Updates
- Upon receipt of an LSP, compute the shortest path
from the observation node to the egress router - If forwarding path changed and it is within
temporal vicinity of loop - see if the observation node lies on the shortest
path before or after the change
21BGP Update Matches
Trace transient persistent (BGP) persistent (no BGP) Total
NYC-20 40.1 0 50.8 90.8
NYC-21 80.2 0 7.5 87.9
NYC-23 3.3 0 0 3.3
NYC-22 18.8 0 80.6 99.4
NYC-24 70.0 0 0 70.0
NYC-25 43.7 15.5 0 59.2
22Factors to Varying Success
- Persistent Loops
- Events occurred before trace collection
- BGP changes external to Sprint
- Comparison with RouteView updates increase in
matches - Geographical distribution of loop destinations
- Measurement PoP not involved in route changes
- Avg of ASes traversed longest for NYC-23
23Conclusions
- Loops can be detected and analyzed
- Loops are not uncommon
- Most are due to BGP updates
- BGP changes farther away from the observations
point may not be identified
24BACKUP SLIDE
25CDF of Number of Replicas
26CDF of Inter-Replica Spacing Time
27Packet Types of All Traffic
28Packet Types of Loops
29Destination Addresses of Loops
Regional 2
Backbone 1
30CDF of Replica Stream Duration in Time
31CDF of Routing Loop Duration in Time
32Overview
- Types and causes behind routing loops
- Transient - part of normal routing protocol
operation - Persistent - long-lasting, manual intervention
required - Detection of routing loops in packet traces
- Detection algorithm
- Observations about the routing loops
- Analysis of performance impact
- Loss, delay, out-of-order delivery
- On-line detection algorithm
- Summary
33Fraction of Packets in Loops
Backbone 4
Backbone 1
34Construction of a Typical End-To-End Path
35Estimate of End-to-End Loss
- Assume
- No loss on the access link due to routing loops
- Losses are independence between links
- Estimate
- Lr from Regional traces
- Lb from Backbone traces but for Backbone 4
- 1 - (1- Lr)2(1- Lb)10 0.003 0.025
- Implications on SLA??
36Delay Due to Routing Loops
37Out-Of-Order Delivery
38Causes of Loop
39Overview
- Types and causes behind routing loops
- Transient - part of normal routing protocol
operation - Persistent - long-lasting, manual intervention
required - Detection of routing loops in packet traces
- Detection algorithm
- Observations about the routing loops
- Analysis of performance impact
- Loss, delay, out-of-order delivery
- On-line detection algorithm
- Summary Future Work
40To Detect a Loop On-line
- Focus on persistent loops
- Questions
- More focus on persistent loops
- How much traffic is affected? -gt alarm
- What prefix is affected? -gt warning
41On-Line Detection Algorithm
- How many packets to /24 get looped? 100
- WARNING
- How many looped packets / million? 5
- How long (in millions) did it last? 10 millions
- ALARM
- By the time an alarm is raised, warnings are
raised and help debugging the system - Fixed memory and computation complexity
42Validation of On-Line Algorithm
43Summary
- Impact of routing on performance has been
analyzed in terms of loss and delay. - Per-link loss varies greatly.
- Excluding outliers, end-to-end loss of 0.3 is
unavoidable. - For a small number of packets that escape the
loops, 50 500 msec delay is added on the
average. - On-line detection algorithm
- In conjunction with routing protocol monitoring,
it will help detect and fix persistent loops.
44Future Work
- More work needed to determined causes behind
routing loops - Correlate with BGP/IS-IS updates
- Address hijacking
- Wrong aggregation
- Origin misconfiguration
- Export misconfiguration
- Integration with existing monitoring tools
45Backup Slides
46Superbowl Sunday, 2/3/2002
47Superbowl Sunday, 2/3/2002
48What Next?
- Alarms and warnings
- How to extract just enough info to be useful
- How to relate it with BGP/IS-IS update info
- How to integrate with management/monitoring
infrastructure