Title: CSE 524: Lecture 9
1CSE 524 Lecture 9
2Administrative
- Homework 3 due Wednesday (10/24)
- E-mail confirmation of research paper topic due
next Monday (10/29) - Midterm next Monday (10/29)
3Network layer (so far)
- Network layer functions
- Network layer implementation (IP)
- Today
- Network layer devices (routers)
- Network processors
- Input/output port functions
- Forwarding functions
- Switching fabric
- Advanced network layer topics
- Routing problems
- Routing metric selection
- Overlay networks
4NL Router Architecture Overview
- Key router functions
- Run routing algorithms/protocol (RIP, OSPF, BGP)
and construct routing table - Switch/forward datagrams from incoming to
outgoing link based on route
5NL Routing vs. Forwarding
- Routing process by which the forwarding table is
built and maintained - One or more routing protocols
- Procedures (algorithms) to convert routing info
to forwarding table. - Forwarding the process of moving packets from
input to output - The forwarding table
- Information in the packet
6NL What Does a Router Look Like?
- Network processor/controller
- Handles routing protocols, error conditions
- Line cards
- Network interface cards
- Forwarding engine
- Fast path routing (hardware vs. software)
- Backplane
- Switch or bus interconnect
7NL Network Processor
- Runs routing protocol and downloads forwarding
table to forwarding engines - Use two forwarding tables per engine to allow
easy switchover (double buffering) - Typically performs slow path processing
- ICMP error messages
- IP option processing
- IP fragmentation
- IP multicast packets
8NL Fast-path router processing
- Packet arrives arrives at inbound line card
- Header transferred to forwarding engine
- Forwarding engine determines output interface
- Forwarding engine signals result to line card
- Packet copied to outbound line card
9NL Input Port Functions
Physical layer bit-level reception
- Decentralized switching
- given datagram dest., lookup output port using
routing table in input port memory - goal complete input port processing at line
speed - queuing if datagrams arrive faster than
forwarding rate into switch fabric
Data link layer e.g., Ethernet see chapter 5
10NL Input Port Queuing
- Fabric slower than input ports combined gt
queuing may occur at input queues - Head-of-the-Line (HOL) blocking queued datagram
at front of queue prevents others in queue from
moving forward - queueing delay and loss due to input buffer
overflow!
11NL Input Port Queuing
- Possible solution
- Virtual output buffering
- Maintain per output buffer at input
- Solves head of line blocking problem
- Each of MxN input buffer places bid for output
- Crossbar connect
- Challenge map of bids to schedule for crossbar
12NL Forwarding Engine
- General purpose processor software
- Packet trains help route hit rate
- Packet train sequence of packets for
same/similar flows - Similar to idea behind IP switching (ATM/MPLS)
where long-lived flows map into single label - Example
- Partridge, et. al. A 50-Gb/s IP Router, IEEE
Trans. On Networking, Vol 6, No 3, June 1998. - 8KB L1 Icache
- Holds full forwarding code
- 96KB L2 cache
- Forwarding table cache
- 16MB L3 cache
- Full forwarding table x 2 - double buffered for
updates
13NL Binary trie
Route Prefixes A 0 B 01000 C
011 D 1 E 100 F 1100 G
1101 H 1110 I 1111
0
1
A
D
1
0
1
0
1
0
0
1
C
E
0
0
0
1
1
F
G
H
I
0
B
14NL Path-compressed binary trie
- Eliminate single branch point nodes
- Variants include PATRICIA and BSD tries
Bit1
Route Prefixes A 0 B 01000 C
011 D 1 E 100 F 1100 G
1101 H 1110 I 1111
0
1
A
D
Bit3
Bit2
1
0
1
0
B
C
E
Bit3
0
1
Bit4
Bit4
0
0
1
1
F
G
H
I
15NL Patricia tries and variable prefix match
- Patricia Tree
- Arrange route entries into a series of bit tests
- Worst case 32 bit tests
- Problem memory speed is a bottleneck
- Used in older BSD Unix routing implementations
0
Bit to test 0 left child,1 right child
10
default 0/0
16
128.2/16
19
128.32/16
128.32.130/240
128.32.150/24
16NL Multi-bit tries
- Compare multiple bits at a time
- Reduces memory accesses
- Forces table expansion for prefixes falling in
between strides - Variable-length multi-bit tries
- Fixed-length multi-bit tries
- Most route entries are Class C
- Cut prefix tree at 16 bit depth
- Many prefixes 8, 16, 24 bits in length
- 64K bit mask
- Bit 1 if tree continues below cut (root head)
- Bit 1 if leaf at depth 16 or less (genuine
head) - Bit 0 if part of range covered by leaf
17NL Variable stride multi-bit trie
- Single level has variable stride lengths
Route Prefixes A 0 B 01000 C
011 D 1 E 100 F 1100 G
1101 H 1110 I 1111
00
01
10
11
A
A
D
D
0
1
00
01
10
11
00
01
10
11
C
C
E
G
F
I
H
0
1
B
18NL Fixed stride multi-bit trie
- Single level has equal strides
Route Prefixes A 0 B 01000 C
011 D 1 E 100 F 1100 G
1101 H 1110 I 1111
000
001
010
011
100
101
110
111
A
A
A
C
E
D
D
D
B
F
F
G
H
G
H
I
I
00
01
10
11
00
01
10
11
00
01
10
11
19NL Other data structures
- Ruiz-Sanchez, Biersack, Dabbous, Survey and
Taxonomy of IP address Lookup Algorithms, IEEE
Network, Vol. 15, No. 2, March 2001 - LC trie
- Lulea trie
- Full expansion/compression
- Binary search on prefix lengths
- Binary range search
- Multiway range search
- Multiway range trees
- Binary search on hash tables (Waldvogel SIGCOMM
97)
20NL Prefix Match issues
- Scaling
- IPv6
- Stride choice
- Tuning stride to route table
- Bit shuffling
21NL Speeding up Prefix Match - Alternatives
- Route caches
- Temporal locality
- Many packets to same destination
- Protocol acceleration
- Add clue (5 bits) to IP header
- Indicate where IP lookup ended on previous node
(Bremler-Barr SIGCOMM 99) - Content addressable memory (CAM)
- Hardware based route lookup
- Input tag, output value associated with tag
- Requires exact match with tag
- Multiple cycles (1 per prefix searched) with
single CAM - Multiple CAMs (1 per prefix) searched in parallel
- Ternary CAM
- 0,1,dont care values in tag match
- Priority (i.e. longest prefix) by order of
entries in CAM
22NL Types of network switching fabrics
Memory
Crossbar interconnection
Multistage interconnection
Bus
23NL Types of network switching fabrics
- Issues
- Switch contention
- Packets arrive faster than switching fabric can
switch - Speed of switching fabric versus line card speed
determines input queuing vs. output queuing
24NL Switching Via Memory
- First generation routers
- packet copied by systems (single) CPU
- 2 bus crossings per datagram
- speed limited by memory bandwidth
- Modern routers
- input port processor performs lookup, copy into
memory - Cisco Catalyst 8500
Memory
Input Port
Output Port
System Bus
25NL Switching Via Bus
- Datagram from input port memory
- to output port memory via a shared bus
- Bus contention switching speed limited by bus
bandwidth - 1 Gbps bus, Cisco 1900 sufficient speed for
access and enterprise routers (not regional or
backbone)
26NL Switching Via An Interconnection Network
- Overcome bus bandwidth limitations
- Crossbar networks
- Fully connected (n2 elements)
- All one-to-one, invertible permutations supported
27NL Switching Via An Interconnection Network
- Crossbar with N2 elements hard to scale
- Multi-stage interconnection networks (Banyan)
- Initially developed to connect processors in
multiprocessor - Typically (n log n) elements
- Datagram fragmented fixed length cells
- Cells switched through the fabric
- Cisco 12000 Gbps through an interconnection
network - Blocking possible (not all one-to-one, invertible
permutations supported)
A
W
B
X
C
Y
D
Z
28NL Output Ports
- Output contention
- Datagrams arrive from fabric faster than output
ports transmission rate - Buffering required
- Scheduling discipline chooses among queued
datagrams for transmission
29NL Output port queueing
- buffering when arrival rate via switch exceeds
ouput line speed - queueing (delay) and loss due to output port
buffer overflow!
30NL Advanced topics
- Routing synchronization
- Routing instability
- Routing metrics
- Overlay networks
31NL Routing Update Synchronization
- Another interesting robustness issue to
consider... - Even apparently independent processes can
eventually synchronize - Intuitive assumption that independent streams
will not synchronize is not always valid - Periodic routing protocol messages from different
routers - Abrupt transition from unsynchronized to
synchronized system states
32NL Examples/Sources of Synchronization
- TCP congestion windows
- Cyclical behavior shared by flows through gateway
- Periodic transmission by audio/video applications
- Periodic downloads
- Synchronized client restart
- After a catastrophic failure
- Periodic routing messages
- Manifests itself as periodic packet loss on pings
- Pendulum clocks on same wall
- Automobile traffic patterns
33NL How Synchronization Occurs
T
A
Message from B
Weak Coupling when As behavior is triggered
off of Bs message arrival!
T
A
Weak coupling can result in eventual
synchronization
34NL Routing Source of Synchronization
- Router resets timer after processing its own and
incoming updates - Creates weak coupling among routers
- Solutions
- Set timer based on clock event that is not a
function of processing other routers updates, or - Add randomization, or reset timer before
processing update - With increasing randomization, abrupt transition
from predominantly synchronized to predominantly
unsynchronized - Most protocols now incorporate some form of
randomization
35NL Routing Instability
- References
- C. Labovitz, R. Malan, F. Jahanian, Internet
Routing Stability'', SIGCOMM 1997. - Record of BGP messages at major exchanges
- Discovered orders of magnitude larger than
expected updates - Bulk were duplicate withdrawals
- Stateless implementation of BGP did not keep
track of information passed to peers - Impact of few implementations
- Strong frequency (30/60 sec) components
- Interaction with other local routing/links etc.
36NL Route Flap Storm
- Overloaded routers fail to send Keep_Alive
message and marked as down - BGP peers find alternate paths
- Overloaded router re-establishes peering session
- Must send large updates
- Increased load causes more routers to fail!
37NL Route Flap Dampening
- Routers now give higher priority to
BGP/Keep_Alive to avoid problem - Associate a penalty with each route
- Increase when route flaps
- Exponentially decay penalty with time
- When penalty reaches threshold, suppress route
38NL BGP Oscillations
- Can possible explore every possible path through
network ? (n-1)! Combinations - Limit between update messages (MinRouteAdver)
reduces exploration - Forces router to process all outstanding messages
- Typical Internet failover times
- New/shorter link ? 60 seconds
- Results in simple replacement at nodes
- Down link ? 180 seconds
- Results in search of possible options
- Longer link ? 120 seconds
- Results in replacement or search based on length
39NL Routing Metrics
- Choice of link cost defines traffic load
- Low cost high probability link belongs to SPT
and will attract traffic, which increases cost - Main problem convergence
- Avoid oscillations
- Achieve good network utilization
40NL Metric Choices
- Static metrics (e.g., hop count)
- Good only if links are homogeneous
- Definitely not the case in the Internet
- Static metrics do not take into account
- Link delay
- Link capacity
- Link load (hard to measure)
41NL Original ARPANET Metric
- Cost proportional to queue size
- Instantaneous queue length as delay estimator
- Problems
- Did not take into account link speed
- Poor indicator of expected delay due to rapid
fluctuations - Delay may be longer even if queue size is small
due to contention for other resources
42NL Metric 2 - Delay Shortest Path Tree
- Delay (depart time - arrival time)
transmission time link propagation delay - (Depart time - arrival time) captures queuing
- Transmission time captures link capacity
- Link propagation delay captures the physical
length of the link - Measurements averaged over 10 seconds
- Update sent if difference gt threshold, or every
50 seconds
43NL Performance of Metric 2
- Works well for light to moderate load
- Static values dominate
- Oscillates under heavy load
- Queuing dominates
44NL Specific Problems
- Range is too wide
- 9.6 Kbps highly loaded link can appear 127 times
costlier than 56 Kbps lightly loaded link - Can make a 127-hop path look better than 1-hop
- No limit to change between reports
- All nodes calculate routes simultaneously
- Triggered by link update
45NL Example
A
Net X
Net Y
B
46NL Example
After everyone re-calculates routes
A
Net X
Net Y
B
.. Oscillations!
47NL Consequences
- Low network utilization (50 in example)
- Congestion can spread elsewhere
- Routes could oscillate between short and long
paths - Large swings lead to frequent route updates
- More messages
- Frequent SPT re-calculation
48NL Revised Link Metric
- Better metric packet delay f(queueing,
transmission, propagation) - When lightly loaded, transmission and propagation
are good predictors - When heavily loaded queueing delay is dominant
and so transmission and propagation are bad
predictors
49NL Normalized Metric
- If a loaded link looks very bad then everyone
will move off of it - Want some to stay on to load balance and avoid
oscillations - It is still an OK path for some
- Hop normalized metric diverts routes that have an
alternate that is not too much longer - Also limited relative values and range of values
advertised ? gradual change
50NL Revised Metric
- Limits on relative change
- Measured link delay is taken over 10sec period
- Link utilization is computed as .5current sample
.5last average - Max change limited to slightly more than ½ hop
- Min change limited to slightly less than ½ hop
- Bounds oscillations
- Normalized according to link type
- Satellite should look good when queueing on other
links increases
51NL Routing Metric vs. Link Utilization
225
9.6 satellite
140
New metric (routing units)
90
9.6 terrestrial
75
56 satellite
60
56 terrestrial
30
0
50
100
25
75
Utilization
52NL Observations
- Utilization effects
- High load never increases cost more than 3cost
of idle link - Cost f(link utilization) only at moderate to
high loads - Link types
- Most expensive link is 7 least expensive link
- High-speed satellite link is more attractive than
low-speed terrestrial link - Allows routes to be gradually shed from link
53NL Idealized Network Response Maps
- Load of average link as a function of that
links cost - Created empirically
1.0
0.8
Increasing applied network load
0.6
Mean load on link
0.4
0.2
0.0
0.5
4.0
2.0
3.0
1.0
1.5
2.5
3.5
Link cost
54NL Equilibrium Calculation
- Combine utilization to cost and cost to
utilization maps - Equilibrium points at intersections
1.0
Increasing applied network load
HN-SPF
0.8
D-SPF
0.6
Mean load on link
0.4
0.2
0.0
0.5
4.0
2.0
3.0
1.0
1.5
2.5
3.5
Link cost
55NL Routing Dynamics
- Limiting maximum metric change bounds oscillation
Metric map
1.0
Bounded oscillation
0.75
0.5
Utilization
0.25
Network response
0
4.0
2.0
3.0
1.0
1.5
2.5
3.5
0.5
Link reported cost
56NL Routing Dynamics
Metric map
1.0
0.75
Utilization
Easing in a new link
0.5
Network response
0.25
0
4.0
2.0
3.0
1.0
1.5
2.5
3.5
0.5
Reported cost
57NL Overlay Routing
- Basic idea
- Treat multiple hops through IP network as one hop
in an overlay network - Run routing protocol on overlay nodes
- Why?
- For performance can run more clever protocol on
overlay - For efficiency can make core routers very
simple - For functionality can provide new features such
as multicast, active processing, IPv6
58NL Overlay for Performance
- References
- Savage et. al. The End-to-End Effects of
Internet Path Selection, SIGCOMM 99 - Anderson et. al. Resilient Overlay Networks,
SOSP 2001 - Why would IP routing not give good performance?
- Policy routing limits selection/advertisement
of routes - Early exit/hot-potato routing local not global
incentives - Lack of performance based metrics AS hop count
is the wide area metric - How bad is it really?
- Look at performance gain an overlay provides
59NL Quantifying Performance Loss
- Measure round trip time (RTT) and loss rate
between pairs of hosts - ICMP rate limiting
- Alternate path characteristics
- 30-55 of hosts had lower latency
- 10 of alternate routes have 50 lower latency
- 75-85 have lower loss rates
60NL Bandwidth Estimation
- RTT loss for multi-hop path
- RTT by addition
- Loss either worst or combine of hops why?
- Large number of flows? combination of
probabilities - Small number of flows? worst hop
- Bandwidth calculation
- TCP bandwidth is based primarily on loss and RTT
- 70-80 paths have better bandwidth
- 10-20 of paths have 3x improvement
61NL Overlay for Efficiency
- Multi-path routing
- More efficient use of links or QOS
- Need to be able to direct packets based on more
than just destination address ? can be
computationally expensive - What granularity? Per source? Per connection? Per
packet? - Per packet ? re-ordering
- Per source, per flow ? coarse grain vs. fine
grain - Take advantage of relative duration of flows
- Most bytes on long flows
62NL Overlay for Features
- How do we add new features to the network?
- Does every router need to support new feature?
- Choices
- Reprogram all routers ? active networks
- Support new feature within an overlay
- Basic technique tunnel packets
- Tunnels
- IP-in-IP encapsulation
- Poor interaction with firewalls, multi-path
routers, etc.
63NL Examples
- IP V6 IP Multicast
- Tunnels between routers supporting feature
- Mobile IP
- Home agent tunnels packets to mobile hosts
location - http//www.rfc-editor.org/rfc/rfc2002.txt
- QOS
- Needs some support from intermediate routers
64NL Overlay Challenges
- How do you build efficient overlay
- Probably dont want all N2 links which links to
create? - Without direct knowledge of underlying topology
how to know whats nearby and what is efficient?
65NL Future of Overlay
- Application specific overlays
- Why should overlay nodes only do routing?
- Caching
- Intercept requests and create responses
- Transcoding
- Changing content of packets to match available
bandwidth - Peer-to-peer applications
66NL Network layer summary
- Network layer functions
- Specific network layers (IPv4, IPv6)
- Specific network layer devices (routers)
- Advanced network layer topics
67NL Network trace
- http//www.cse.ogi.edu/class/cse524/trace.txt
68NL End of material for midterm
- Midterm next Monday 10/29/2001 covering.
- Technical material in lectures
- Chapters 1, 4, and 5
- Chapter 1
- Chapter 4.1-4.7
- Chapter 5
- Review questions at end of chapters