Title: Exterior Gateway Protocols: EGP, BGP4, CIDR: Brief Version
1Exterior Gateway Protocols EGP, BGP-4,
CIDRBrief Version
- Shivkumar Kalyanaraman
- Rensselaer Polytechnic Institute
- shivkuma_at_ecse.rpi.edu
- http//www.ecse.rpi.edu/Homepages/shivkuma
- Based in part upon slides of Tim Griffin (ATT),
Ion Stoica (UCB), J. Kurose (U Mass), Noel
Chiappa (MIT)
2Overview
- Cores, Peers, and the limit of default routes
- Autonomous systems EGP
- BGP4
- CIDR reducing router table sizes
- Refs Chap 10,14,15. Books Routing in Internet
by Huitema, Interconnections by Perlman, BGP4
by Stewart, Sam Halabi, Danny McPherson, Internet
Routing Architectures - Reading Geoff Huston, Commentary on Inter-domain
Routing in the Internet - Reference BGP-4 Standards Document In TXT
- Reading Norton, Internet Service Providers and
Peering - Reading Labovitz et al, Delayed Internet Routing
Convergence - Reference Paxson, End-to-End Routing Behavior in
the Internet, - Reading Interdomain Routing Additional Notes
In PDF In MS Word - Reference Site Griffin, Interdomain Routing
Links
3Intra-AS and Inter-AS routing
- Gateways
- perform inter-AS routing amongst themselves
- perform intra-AS routers with other routers in
their AS
b
a
a
C
B
d
A
4History of Inter-Domain Routing EGP
5History Default Routes limits
- Default routes gt partial information
- Routers/hosts w/ default routes rely on other
routers to complete the picture. - In general routing signposts should be
- Consistent, I.e., if packet is sent off in one
direction then another direction should not be
more optimal. - Complete, I.e., should be able to reach all
destinations
6Core
- A small set of routers that have consistent
complete information about all destinations. - Non-core routers can have partial information
provided they point default routes to the core - Partial info allows site administrators to make
local routing changes independently.
CORE
S1
S2
Sm
. . .
7Peer Backbones
- Initially NSFNET only 1 link to ARPANET
- Addition of multiple links gt multiple possible
routes gt need for dynamic routing - Today there are over 30 backbones!
- Routing protocol at cores/peers GGP -gt EGP-gt
BGP-4
Peering Link
8Exterior Gateway Protocol (EGP)
- A mechanism that allows non-core routers to learn
routes from core (external routes) routers so
that they can choose optimal backbone routes - A mechanism for non-core routers to inform core
routers about hidden networks (internal routes) - Autonomous System (AS) has the responsibility of
advertising reachability info to other ASs.
9Purpose of EGP
AS2
EGP
AS1
A
border router
internal router
Share connectivity information across ASes
10EGP Operation
- Neighbor Acquisition Reliable 2-way handshake
- Neighbor Reachability
- Hellos j out of m hellos OK gt Neighbor UP
- k out of n hellos NOT OK gt Neighbor DOWN
- Updates/Queries
- EGP is an incremental protocol. New info gt send
updates - Each router can query neighbors as well
- Reachability advertized metrics ignored
- Requires a tree topology of ASes to avoid loops
(see next slide)
11Why EGP Requires a Tree Structure..
12EGP weaknesses
- EGP does not interpret the distance metrics in
routing update messages gt cannot be compute
shorter of two routes - As a result it restricts the topology to a tree
structure, with the core as the root - Rapid growth gt many networks may be temporarily
unreachable - Only one path to destination gt no load sharing
- Need new protocol gt BGP-4
13The Current Stage for Inter-Domain Routing ASes
Policy Routing Scenarios
14Todays Big Picture
Large ISP
Large ISP
Stub
Small ISP
Dial-Up ISP
Access Network
Stub
Stub
Large number of diverse networks
15Autonomous Systems (ASes)
- An autonomous system is an autonomous routing
domain that has been assigned an Autonomous
System Number (ASN). - All parts within an AS remain connected.
16Autonomous System(AS)
- An autonomous system (AS) is a network under a
single administrative control - An AS owns an IP prefix
- Every AS has a unique AS number
- ASes need to inter-network themselves to form a
single virtual global network - Need a common protocol for communication
- I.e. BGP-4
17IP Address Allocation and Assignment Internet
Registries
IANA www.iana.org
APNIC www.apnic.org
ARIN www.arin.org
RIPE www.ripe.org
Allocate to National and local
registries and ISPs Addresses assigned
to customers by ISPs
RFC 2050 - Internet Registry IP Allocation
Guidelines RFC 1918 - Address Allocation
for Private Internets RFC 1518 - An
Architecture for IP Address Allocation with CIDR
18AS Numbers (ASNs)
ASNs are 16 bit values.
64512 through 65535 are private
Currently over 11,000 in use.
- Genuity 1
- MIT 3
- Harvard 11
- UC San Diego 7377
- ATT 7018, 6341, 5074,
- UUNET 701, 702, 284, 12199,
- Sprint 1239, 1240, 6211, 6242,
ASNs represent units of routing policy
19Internet AS Map caida.org
20Which Routers do Inter-AS routing?
AS2
BGP
AS1
border router
internal router
- Two types of routers
- Border router(Edge), Internal router(Core)
- Two border routers of different ASes will have a
BGP - session
21Requirements for Inter-AS Routing
- Should scale for the size of the global Internet.
- Focus on reachability, not optimality
- Use address aggregation techniques to minimize
core routing table sizes and associated control
traffic - At the same time, it should allow flexibility in
topological structure (eg dont restrict to
trees etc) - Allow policy-based routing between autonomous
systems - Policy refers to arbitrary preference among a
menu of available routes (based upon routes
attributes) - Fully distributed routing (as opposed to a
signaled approach) is the only possibility. - Extensible to meet the demands for newer policies.
22Policy Routing Nontransit vs Transit ASes
Internet Service providers (ISPs) have transit
networks
ISP 2
ISP 1
NET A
Nontransit AS (Stub) might be a corporate or
campus network. Could be a content provider
Traffic NEVER flows from ISP 1 through NET A to
ISP 2
23Policy Routing Selective Transit
NET B
NET C
NET A provides transit between NET B and NET
C and between NET D and NET C
NET A
NET A DOES NOT provide transit Between NET D and
NET B
NET D
Most transit ASes allow only selective
transit key impact of commercialization
24Policy Routing Customers Providers
provider
customer
Customer pays provider for access to the Internet
25Policy Routing Customer-Provider Hierarchy
IP traffic
provider
customer
26Policy Routing The Peering Relationship
Peers provide transit between their respective
customers Peers do not provide transit between
peers Peers (often) do not exchange
traffic allowed
traffic NOT allowed
27Peering Wars
Peer
Dont Peer
- Reduces upstream transit costs
- Can increase end-to-end performance
- May be the only way to connect your customers to
some part of the Internet (Tier 1)
- You would rather have customers
- Peers are usually your competition
- Peering relationships may require periodic
renegotiation
Peering struggles are by far the most
contentious issues in the ISP world! Peering
agreements are often confidential.
28BGP-4 Design
29Recall Distributed Routing Techniques
Link State
Vectoring
- Topology information is flooded within the
routing domain - Best end-to-end paths are computed locally at
each router. - Best end-to-end paths determine next-hops.
- Based on minimizing some notion of distance
- Works only if policy is shared and uniform
- Examples OSPF, IS-IS
- Each router knows little about network topology
- Only best next-hops are chosen by each router for
each destination network. - Best end-to-end paths result from composition of
all next-hop choices - Does not require any notion of distance
- Does not require uniform policies at all routers
- Examples RIP, BGP
30BGP-4
- BGP Border Gateway Protocol
- Is a Policy-Based routing protocol
- Is the de facto EGP of todays global Internet
- Relatively simple protocol, but configuration is
complex and the entire world can see, and be
impacted by, your mistakes.
- 1989 BGP-1 RFC 1105
- Replacement for EGP (1984, RFC 904)
- 1990 BGP-2 RFC 1163
- 1991 BGP-3 RFC 1267
- 1995 BGP-4 RFC 1771
- Support for Classless Interdomain Routing (CIDR)
31BGP Operations (Simplified)
Establish session on TCP port 179
AS1
BGP session
Exchange all active routes
AS2
While connection is ALIVE exchange route UPDATE
messages
Exchange incremental updates
32Four Types of BGP Messages
- Open Establish a peering session.
- Keep Alive Handshake at regular intervals.
- Notification Shuts down a peering session.
- Update Announcing new routes or withdrawing
previously announced routes.
announcement
prefix attributes values
33Border Gateway Protocol (BGP)
- Allows arbitrary AS topologies
- Uses a path-vector concept to help prevent
routing loops in complex topologies - For inter-domain routing shortest path may not
be preferred for policy, security, cost reasons. - Different routers have different preferences
(policy) gt as packet goes thru network it will
encounter different policies - gt Bellman-Ford or Dijkstra dont work!
- Soln BGP allows attributes for AS and paths
which could include policies (policy-based
routing).
34BGP (Contd)
- Consistency criterion When a BGP Speaker A
advertises a prefix to its B that it has a path
to IP prefix C - B can be certain that A is actively using that
AS-path to reach that destination - BGP uses TCP between 2 peers (reliability)
- Exchange entire BGP table first (50K routes!)
- Later exchanges only incremental updates
- Application (BGP)-level keepalive messages
- Interior and exterior peers need to exchange
reachability information among interior peers
before updating intra-AS forwarding table.
35Two Types of BGP Neighbor Relationships
- External Neighbor (eBGP) in a different
Autonomous Systems - Internal Neighbor (iBGP) in the same Autonomous
System
AS1
iBGP is routed (using IGP!)
eBGP
iBGP
AS2
36I-BGP and E-BGP
IGP
A
E-BGP
AS2
37I-BGP vs IGP
- Why is IGP (OSPF, ISIS) not used ?
- In large ASs full route table is very large (100K
routes!) - Rate of change of routes is frequent
- Tremendous amount of control traffic
- Not to mention Dijkstra computation being evoked
for any change - BGP policy information may be lost
- I-BGP Within an AS
- Same protocol/state machines as EBGP
- But different rules about advertising prefixes
38IBGP vs EBGP
- I-BGP nodes typically ABRs, or other nodes where
default routes terminate - I-BGP peering sessions between every pair of
routers within an AS full mesh.
Physical link
A
IBGP session
D
C
B
AS1
39IBGP Peers Fully Meshed
- IBGP is needed to avoid routing loops within an
AS - Full Mesh gt
- Independent of physical connectivity.
- Single link may see same update multiple times!
- IBGP neighbors do not announce routes received
via iBGP to other iBGP neighbors.
eBGP update
40IBGP Scaling Route Reflection
- Add hierarchy to I-BGP
- Route reflector A router whose BGP
implementation supports the re-advertisement of
routes between I-BGP neighbors - Route reflector client A router which depends on
route reflector to re-advertise its routes to
entire AS and learn routes from the route
reflector
41Route Reflection
128.23.0.0/16
RR2
RR-C4
RR-C1
RR1
RR3
RR-C3
RR-C2
AS1
ER
EBGP
10.0.0.0/24
AS2
IBGP
42AS Confederations
- Divide and conquer Divides a large AS into
sub-ASs
Sub-AS
11
10
14
13
12
R1
AS-1
R2
43BGP-4 Support for Scaling and Address Management
44CIDR
- Shortage of class Bs gt give out a set of class
Cs instead of one class B address - Problem every class C n/w needs a routing entry
! - Solution Classless Inter-domain Routing (CIDR).
- Also called supernetting
- Key allocate addresses such that they can be
summarized, I.e., contiguously. - Share same higher order bits (I.e. prefix)
- Routing tables and protocols must be capable of
carrying a subnet mask. Notation 128.13.0/23 - When an IP address matches multiple entries (eg
194.0.22.1), choose the one which had the longest
mask (longest-prefix match)
45Inter-domain Routing Without CIDR
204.71.0.0
204.71.0.0
Global Internet Routing Mesh
204.71.1.0
Service Provider
204.71.1.0
204.71.2.0
204.71.2.0
....
....
204.71.255.0
204.71.255.0
Inter-domain Routing With CIDR
204.71.0.0
Global Internet Routing Mesh
204.71.1.0
Service Provider
204.71.2.0
204.71.0.0/16
....
204.71.255.0
46RFC 1519 Classless Inter-Domain Routing (CIDR)
Pre-CIDR Network ID ended on 8-, 16, 24- bit
boundary CIDR Network ID can end at any bit
boundary
IP Address 12.4.0.0 IP Mask 255.254.0.0
Address
Mask
for hosts
Network Prefix
Usually written as 12.4.0.0/15, a.k.a
supernetting
47Longest Prefix Match (Classless) Forwarding
Destination 12.5.9.16 ---------------------------
---- payload
OK
better
even better
best!
48CIDR at Work, No load balancing
Table at ISP3
128.40/16
Link A
ISP1 128.32/11
AS1 128.40/16 140.127/16
ISP3
Link B
ISP2 140.64/10
140.127/16
49CIDR Subverted for Load Balancing
Table at ISP3
140.255.20/24, 128.40/16
Link A
ISP1 128.32/11
AS1 128.40/16 140.127/16
ISP3
Link B
ISP2 140.64/10
128.42.10/24, 140.127/16
50Deaggregation Multihoming
If AS 1 does not announce the more specific
prefix, then most traffic to AS 2 will go
through AS 3 because it is a longer match
12.2.0.0/16
12.2.0.0/16
12.0.0.0/8
AS 3
AS 1
provider
provider
customer
AS 2
12.2.0.0/16
AS 2 is punching a hole in the CIDR block of
AS 1gt subverts CIDR
51Policy Routing in BGP-4
52What is Routing Policy
- Policy refers to arbitrary preference among a
menu of available routes - Public description of the relationship between
external BGP peers - Can also describe internal BGP peer relationship
- BGP Hook policy routing choice based upon
routes attributes
53How to do policy routing?
192.0.2.0/24 pick me!
192.0.2.0/24 pick me!
192.0.2.0/24 pick me!
Given multiple routes to the same prefix, a BGP
speaker must pick at most one best route based
upon routes attributes
192.0.2.0/24 pick me!
54BGP Policy Knob Attributes
Value Code
Reference ----- -----------------------------
---- --------- 1 ORIGIN
RFC1771 2 AS_PATH
RFC1771 3 NEXT_HOP
RFC1771 4
MULTI_EXIT_DISC RFC1771 5
LOCAL_PREF RFC1771
6 ATOMIC_AGGREGATE
RFC1771 7 AGGREGATOR
RFC1771 8 COMMUNITY
RFC1997 9 ORIGINATOR_ID
RFC2796 10 CLUSTER_LIST
RFC2796 11 DPA
Chen 12
ADVERTISER RFC1863 13
RCID_PATH / CLUSTER_ID RFC1863
14 MP_REACH_NLRI
RFC2283 15 MP_UNREACH_NLRI
RFC2283 16 EXTENDED
COMMUNITIES Rosen ... 255
reserved for development
We will cover a subset of these attributes
Not all attributes need to be present in every
announcement
From IANA http//www.iana.org/assignments/bgp-par
ameters
55Import and Export Policies
- For inbound traffic
- Filter outbound routes
- Tweak attributes on outbound routes in the hope
of influencing your neighbors best route
selection - For outbound traffic
- Filter inbound routes
- Tweak attributes on inbound routes to influence
best route selection
outbound routes
inbound traffic
inbound routes
outbound traffic
In general, an AS has more control over outbound
traffic
56BGP Route Processing
Apply Policy filter routes tweak attributes
Apply Policy filter routes tweak attributes
Receive BGP Updates
Best Routes
Transmit BGP Updates
Based on Attribute Values
Best Route Selection
Apply Import Policies
Best Route Table
Apply Export Policies
Install forwarding Entries for best Routes.
IP Forwarding Table
57Policy Implementation Flow
58Conceptual Model of BGP Operation
- RIB Routing Information Base
- Adj-RIB-In Prefixes learned from neighbors. As
many Adj-RIB-In as there are peers - Loc-RIB Prefixes selected for local use after
analyzing Adj-RIB-Ins. This RIB is advertised
internally. - Adj-RIB-Out Stores prefixes advertised to a
particular neighbor. As many Adj-RIB-Out as there
are neighbors
59BGP-4 Messages and Route Attributes
60UPDATE message in BGP
- Primary message between two BGP speakers.
- Used to advertise/withdraw IP prefixes (NLRI)
- Path attributes field unique to BGP
- Apply to all prefixes specified in NLRI field
- Optional vs Well-known Transitive vs
Non-transitive
2 octets
Withdrawn Routes Length
Withdrawn Routes (variable length)
Total Path Attributes Length
Path Attributes (variable length)
Network Layer Reachability Info. (NLRI variable
length)
61Path Attributes ORIGIN
- ORIGIN
- Describes how a prefix came to BGP at the origin
AS - Prefixes are learned from a source and injected
into BGP - Directly connected interfaces, manually
configured static routes, dynamic IGP or EGP - Values
- IGP (EGP) Prefix learnt from IGP (EGP)
- INCOMPLETE Static routes
62Path Attributes AS-PATH
- List of ASs thru which the prefix announcement
has passed. AS on path adds ASN to AS-PATH - Eg 138.39.0.0/16 originates at AS1 and is
advertised to AS3 via AS2. - Eg AS-SEQUENCE 100 200
- Used for loop detection and path selection
AS1 (100)
AS3 (15)
138.39.0.0/16
AS2 (200)
63Traffic Often Follows ASPATH
135.207.0.0/16 ASPATH 3 2 1
AS 4
AS 3
AS 1
AS 2
135.207.0.0/16
IP Packet Dest 135.207.44.66
64 But It Might Not
AS 2 filters all subnets with masks longer than
/24
135.207.0.0/16 ASPATH 1
135.207.0.0/16 ASPATH 3 2 1
135.207.44.0/25 ASPATH 5
AS 4
AS 3
AS 1
AS 2
135.207.0.0/16
IP Packet Dest 135.207.44.66
From AS 4, it may look like this packet will take
path 3 2 1, but it actually takes path 3 2 5
AS 5
135.207.44.0/25
65Shorter AS-PATH Doesnt Mean Shorter Hops
BGP says that path 4 1 is better
than path 3 2 1
Duh!
AS 4
AS 3
AS 2
AS 1
66ASPATH Padding Shed inbound traffic
AS 1
provider
192.0.2.0/24 ASPATH 2 2 2
192.0.2.0/24 ASPATH 2
Padding will (usually) force inbound traffic
from AS 1 to take primary link
backup
primary
customer
192.0.2.0/24
AS 2
67Load-Balancing Knobs in BGP
- LOCAL-PREF outbound traffic, local preference
(box-level knob) - MED Inbound-traffic, typically from the same ISP
(link-level knob)
AS1
AS2
Local Preference
MED
68Path Attribute LOCAL-PREF
- Locally configured indication about which path is
preferred to exit the AS in order to reach a
certain network. Default value 100. Higher is
better.
69Hot Potato Routing Closest Egress Point
192.44.78.0/24
egress 2
egress 1
IGP distances
56
15
This Router has two BGP routes to 192.44.78.0/24.
Hot potato get traffic off of your network as
Soon as possible. Go for egress 1!
70Getting Burned by the Hot Potato
2865
High bandwidth Provider backbone
17
SFF
NYC
Low b/w customer backbone
56
15
San Diego
Many customers want their provider to carry the
bits!
tiny http request
huge http reply
71Attributes MULTI-EXIT Discriminator
- Also called METRIC or MED Attribute. Lower is
better - AS1multihomed customer.
- AS2 (provider) includes MED to AS1
- AS1 chooses which link (NEXTHOP) to use
- Eg traffic to AS3 can go thru Link1, and AS2
thru Link2
Link A
AS3
AS2
AS1
Link B
AS4
72MEDs Can Export Internal Instability
2865
17
FLAP
FLAP
192.44.78.0/24 MED 56 OR 10
192.44.78.0/24 MED 15
10
FLAP
FLAP FLAP
56
15
FLAP
192.44.78.0/24
73How Can Routes be Colored?BGP Communities
- Used within and between
- ASes
- The set of ASes must agree on how to interpret
the community value - Very powerful BECAUSE it
- has no (predefined) meaning
Community Attribute a list of community
values. (So one route can belong to multiple
communities)
RFC 1997 (August 1996)
74Communities Example
- 1100
- Customer routes
- 1200
- Peer routes
- 1300
- Provider Routes
- To Customers
- 1100, 1200, 1300
- To Peers
- 1100
- To Providers
- 1100
Import
Export
AS 1
75BGP Route Selection Process
Series of tie-breaker decisions...
- If NEXTHOP is inaccessible do not consider the
route. - Prefer largest LOCAL-PREF
- If same LOCAL-PREF prefer the shortest AS-PATH.
- If all paths are external prefer the lowest
ORIGIN code (IGPltEGPltINCOMPLETE). - If ORIGIN codes are the same prefer the lowest
MED. - If MED is same, prefer min-cost NEXT-HOP
- If routes learned from EBGP or IBGP, prefer paths
learnt from EBGP - Final tie-break Prefer the route with I-BGP ID
(IP address)
76Route Selection Summary
Highest Local Preference
Enforce relationships
Shortest ASPATH
Lowest MED
traffic engineering
i-BGP lt e-BGP
Lowest IGP cost to BGP egress
Throw up hands and break ties
Lowest router ID
77Caveat
- BGP is not guaranteed to converge on a stable
routing. Policy interactions could lead to
livelock protocol oscillations. - See Persistent Route Oscillations in
Inter-domain Routing by K. Varadhan, R.
Govindan, and D. Estrin. ISI report, 1996 - Corollary BGP is not guaranteed to recover from
network failures.
78BGP Table Growth
Thanks Geoff Huston. http//www.telstra.net/ops/b
gptable.html
79ASNs Growth
From Geoff Huston. http//www.telstra.net/ops
80BGP Updates Mostly Stable
Most prefixes are stable most of the time. On
this day, about 83 of the prefixes were not
updated.
Typically, 80 of the updates are for less than
5 Of the prefixes.
Percent of BGP table prefixes
Thanks to Madanlal Musuvathi for this plot.
Data source RIPE NCC
81Route Flap Dampening
penalty for each flap 1000
82BGP Convergence How Long Does BGP Take to Adapt
to Changes?
From Abha Ahuja and Craig Labovitz
83Summary
- BGP is a fairly simple protocol
- but it is not easy to configure
- BGP is running on more than 100K routers making
it one of worlds largest and most visible
distributed systems - Global dynamics and scaling principles are still
not well understood - Traffic Engineering hacked in as an afterthought