IRTF-RR - PowerPoint PPT Presentation

About This Presentation
Title:

IRTF-RR

Description:

IRTF-RR ahuja_at_umich.edu – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 31
Provided by: Abh132
Category:
Tags: irtf | policies | routing

less

Transcript and Presenter's Notes

Title: IRTF-RR


1
IRTF-RR
  • ahuja_at_umich.edu

2
IRTF agenda
  • Agenda issues (5 sec)
  • Intro - why are we here (10 sec) - abha
  • Goals of the group, etc (30 min)- sean
  • Topics of Interest
  • Convergence (10 minutes) - abha
  • Nimrod (20 minutes) - noel
  • Questions and Answers/Feedback

3
IRTF RR intro
  • Who are we?
  • ahuja_at_umich.edu
  • smd_at_ebone.net
  • irtf-rr-chairs_at_nether.net
  • Where is the info?
  • http//www.nether.net/irtf-rr
  • irtf-rr-request_at_nether.net

4
IRTF RR
  • Why are we here?
  • Resurrect this working group
  • Open session to tell folks what we are working on
  • Get feedback from the public for additional
    topics to add to our list

5
IRTF-RR goals
  • do routing research )
  • most of work done in mailing list and small groups

6
Approaching the issues...
  • What is going on now?
  • Routing issues today
  • What are the problems?
  • What can we do fix it?
  • What should we do in the future?

7
Routing Research
  • Topics of interest
  • routing convergence, stability and scalability
  • fault tolerance
  • Quality of Service routing
  • multicast routing
  • Extremely dynamic contraint-based routing
  • Traffic engineering
  • NAT and IPv6 routing
  • optical networks and routing
  • operational concerns of routing

8
QA
  • What issues do you think are important to
    address?
  • QoS?
  • Convergence?
  • Scalability?

9
Experimental Measurement of Delayed Convergence
  • Craig Labovitz
  • Microsoft Research/Merit Network, Inc.
  • Abha Ahuja
  • Merit Network, Inc.
  • Farnam Jahanian, Abhijit Bose
  • University of Michigan

Slides originally presented at NANOG. IRTF-RR at
Pittsburgh IETF email ahuja_at_umich.edu
10
The Internet Failure Analysis
Something happens. Doesnt work.
11
Routing Protocol Convergence
  • Unlike connection oriented PSTN (30 ms),
    Internet does not have fail-over.
  • Instead, each node recalculates on a hop-per-hop
    basis (i.e. no flooding of changes)
  • Distance-vector algorithms (e.g. RIP, BGP)
    exhibit slower convergence than link state
    protocols
  • During convergence
  • Latency, loss, out of order
  • Additional update messages (CPU processing)

12
Distance Vector (BF) Protocols
  • Suffer from counting to infinity problem
  • Solutions
  • Poison reverse
  • Split horizon
  • Path vectors

B
Example
A
C
13
Conventional Wisdom
  • Restoral is not an issue in the IP world
  • Just reroute around in a few milliseconds or
    whatever
  • BGP convergence takes only a few _____
  • Bad news travels fast
  • Fast withdraw propagation valid goal
  • Announcements slower because bundled
  • BGP has great convergence properties
  • ASPath solved the convergence and counting to
    infinity problems
  • All my customers are multi-homed, triple-homed
  • Convergence -- what, me worry?

14
More Conventional Wisdom
  • Enough bandwidth will solve anything
  • It will all be one big network one day soon
    anyways
  • (Especially after yesterday)

15
Internet Failures
  • Replication, round-robin DNS, etc. helps
    reliability of inter-domain content oriented
    services
  • Inter-domain transaction oriented services (e.g.
    VoIP, EBay, database commits, etc.) still pose a
    challenge
  • Important model how long does it take for the
    Internet to converge
  • After Failure
  • After Fail-Over
  • After Repair

16
BGP Bad news
  • With unconstrained policies (Griffin99,
    Varadhan96)
  • Divergence
  • Possible create mutually unsatisfiable policies
  • NP-complete to identify these policies in IRR
  • Happening today?
  • With constrained policies (e.g. shortest path
    first)
  • Transient oscillations
  • BGP usually converges
  • It might just take a very long time.
  • This talk is about constrained policies

17
Some Observations
  • How do we study convergence?
  • From BGP logs (e.g. debug ip bgp), difficult to
    determine causal relationships
  • Earlier work studied BGP pathologies and failures
  • Still lots of BGP duplicates and oscillations
  • Failure/repair data (next slide) for default-free
    routes shows 30 minute curve
  • Examined long-lived default-free routes from 24
    providers for a year
  • Restoral time for given provider after failure
    (i.e. route withdrawn)

18
How long until routes return? (From A Study of
Internet Failures)
What is happening here?
19
16 Month Study of Convergence
  • Instrument the Internet
  • Inject routes into geographically and
    topologically diverse provider BGP peering
    sessions (Mae-West, Japan, Michigan, London)
  • Periodically fail and change these routes (i.e.
    send withdraws or new attributes)
  • Time events using ICMP echos and NTP synchronized
    BGP routeviews monitoring machines (also http
    gets)
  • Write lots of Perl scripts
  • Wait a sixteen months (45,000 routing events)

20
Setup
21
How Many Announcements Does it Take For an AS to
Withdraw a Route?
7/5 193325 Route R is withdrawn 7/5
193415 AS6543 announce R 6543 66665 8918 1
5696 999 7/5 193500 AS6543 announce R 6543
66665 8918 67455 6461 5696 999 7/5 193537
AS6543 announce R 6543 66665 4332 6461 5696
999 7/5 193539 AS6543 announce R 6543
66665 5378 6660 67455 6461 5696 999 7/5 193539
AS6543 announce R 6543 66665 65 6461
5696 999 7/5 193552 AS6543 announce R
6543 66665 6461 5696 999 7/5 193600 AS6543
announce R 6543 66665 5378 6765 6660 67455
6461 5696 999 7/5 193822 AS6543 withdraw R
Answer Up to 19
(AS6543 chosen as an example all ASes exhibit
similar behavior) Abha made me change the AS
numbers
22
Withdraw Convergence
  • After a BGP route is withdrawn, barring other
    failures, how long does it take Internet routing
    tables to reach steady-state?

23
Withdraw Convergence
AS1 AS2 AS3 AS4
24
Withdraw Convergence
  • Probability distribution
  • Providers exhibit different, but related
    convergence behaviors
  • 80 of withdraws from all ISPs take more than a
    minute
  • For ISP4, 20 withdraws took more than three
    minutes to converge

25
Fail-Overs and Repairs
  • What are the relative convergence latencies for
    fail-overs and repairs?
  • Does bad news (withdraws) travel faster?

26
Failures, Fail-overs and Repairs
27
Failures, Fail-overs and Repairs
  • Bad news does not travel fast
  • Repairs (Tup) exhibit similar convergence
    properties as long-short ASPath fail-over
  • Failures (Tdown) and short-long fail-overs (e.g.
    primary to secondary path) also similar
  • Slower than Tup (e.g. a repair)
  • 60 take longer than two minutes
  • Fail-over times degrade the greater the degree of
    multi-homing!

28
What is Happening?
  • Non-deterministic ordering of BGP update messages
    leads to
  • Transient oscillations
  • Each change in FIB adds delay (CPU, BGP bundling
    timer)
  • At extreme, convergence triggers BGP dampening

29
BGP and RIP
  • RIP precisely monotonically increasing. Can
    explore metrics (1N)
  • BGP monotonically increasing. Multiple (N!) ways
    to represent a path metric of N.
  • BGP solved RIP routing table loop problem by
    making it exponentially worse

N4
30
Questions?
  • send email to ahuja_at_umich.edu
Write a Comment
User Comments (0)
About PowerShow.com