Title: RON: Resilient Overlay Networks
1RON Resilient Overlay Networks
- David Andersen, Hari Balakrishnan,
- Frans Kaashoek, Robert Morris
- MIT Laboratory for Computer Science
- http//nms.lcs.mit.edu/ron/
2Fault-tolerant Networking
B
A
C
D
Any-to-any communication, routing around failures
3The Internet
Mom-and-popISP
Really-big ISP everyones afraid of
Big ISP
Autonomous System (AS)
Peering
BGP4
Scalability via aggressive aggregation and
information hiding Commercial reality via peering
transit relationships
4How Robust is Internet Routing?
Paxson 95-97 3.3 of all routes had serious problems
Labovitz 97-00 10 of routes available lt 95 of the time 65 of routes available lt 99.9 of the time 3-min minimum detectionrecovery time often 15 mins 40 of outages took 30 mins to repair
Chandra 01 5 of faults last more than 2.75 hours
- Slow outage detection and recovery
- Inability to detect badly performing paths
- Inability to efficiently leverage redundant paths
- Inability to perform application-specific routing
- Inability to express sophisticated routing policy
5Our Goal
- To improve communication availability for small
groups by at least a factor or 10
- Many applications
- Collaboration and conferencing
- Virtual Private Networks (VPNs) across public
Internet - Overlay Internet Service
6RON Routing Using Overlays
- Cooperating end-systems in different routing
domains can conspire to do better than scalable
wide-area protocols
Scalable BGP-based IP routing substrate
- Types of failures
- Outages Configuration/operational errors,
backhoes, etc. - Performance failures Severe congestion,
denial-of-service attacks, etc.
7RON Design
Nodes in different routing domains (ASes)
RON library
Performance Database
Application-specific routing tables Policy
routing module
8Many Research Questions
- Does the RON approach work at all?
- Each RON is small in size, no more than 50 or 100
nodes - How fast can failure detection recovery happen?
- Policy routing
- Doesnt RON violate AUPs and other policies?
- Routing behavior
- Can stable routing be achieved?
- Implementing efficient multi-criteria routing
- Is it safe to deploy a large number of (small)
interacting RONs on the Internet?
9RON Deployment (19 sites)
To vu.nl lulea.se ucl.uk
To kaist.kr, .ve
.com (ca), .com (ca), dsl (or), cci (ut), aros
(ut), utah.edu, .com (tx) cmu (pa), dsl (nc), nyu
, cornell, cable (ma), cisco (ma), mit, vu.nl,
lulea.se, ucl.uk, kaist.kr, univ-in-venezuela
10RON Experiments
- Measure loss, latency, and throughput with and
without RON - 13 hosts in the US and Europe
- 3 days of measurements from data collected in
March 2001 - 30-minute average loss rates
- A 30 minute outage is very serious!
- Note Experiments done with No-Internet2-for-comm
ercial-use policy
11RON greatly improves loss-rate
30-min average loss rate on Internet
RON loss rate never more than 30
13,000 samples
30-min average loss rate with RON
12An order-of-magnitude fewer failures
30-minute average loss rates
Loss Rate RON Better No Change RON Worse
10 479 57 47
20 127 4 15
30 32 0 0
50 20 0 0
80 14 0 0
100 10 0 0
6,825 path hours represented here 12 path
hours of essentially complete outage 76 path
hours of TCP outage RON routed around all of
these! One indirection hop provides almost all
the benefit!
13Resilience Against DoS Attacks
14Conclusion
- Improved availability of Internet communication
paths using small overlays - Layered above scalable IP substrate
- RON provides a set of libraries and programs to
facilitate this application-specific routing - Experimental data suggest that this approach
works - Over 10X availability
- Outage detection and recovery in about 15 seconds
- Able to route around certain denial-of-service
attacks - Many interesting questions remain
http//nms.lcs.mit.edu/ron/
15Policy Routing
- Today, wide-area policy expression is a
sledgehammer - Policy control is important
- From talking to some providers
- E.g., rate control policy Internet2, etc.
- True, RONs could violate AUPs
- But, the RON approach enables more flexible
policies - More complex routing decisions rate-based too
- Multiple routing tables
- Deeper packet inspection, etc.
16Example
17Throughput Improvement