Title: Rethinking Network Control
1Rethinking Network Control ManagementThe Case
for a New 4D Architecture
- David A. Maltz
- Carnegie Mellon University
- Joint work with
- Albert Greenberg, Gisli Hjalmtysson
- Andy Myers, Jennifer Rexford, Geoffrey Xie,
- Hong Yan, Jibin Zhan, Hui Zhang
2Is the Network Down Again?
- You sit at your home computer, trying to access a
computer at work - But no data is getting through
- Minutes or hours later, data flows again
- You never find out why
Network operators arent much better at
predicting outages
3Outline
- What do networks look like today?
- New approach to predicting network behavior
- A new architecture for controlling networks
4Many Kinds of Networks
- Each has different
- Size generally 10-1000 routers each
- Owner company, university, organization
- Topology mesh, tree, ring
- Examples
- Enterprise/Campus networks
- Access networks DSL, cable modems
- Metro networks connect up biz in cities
- Data center networks disk arrays servers
- Transit/Backbone networks
5A Conventional View of a Network
E
H
A
C
F
I
J
D
B
G
- Physical topology is a graph of nodes and links
- Run Dijkstra to find route to each node
6A Conventional View of a Network
E
H
A
C
F
I
Knowing how the routers are connected says almost
nothing about whether or not two hosts can
communicate
J
D
B
G
- Physical topology is a graph of nodes and links
- Run Dijkstra to find route to each node
7Network Equipment
Picture from Internet2 Abilene Network
- Boxes router, switch
- Links Ethernet, SONET, T1,
8The Data Plane of a Network
Hosts/servers
Router/Switch
Interfaces
9Packets
Source Address Destination Addr Port numbers .
Meta-data
Packet
User data
- For this talk, networks traffic in packets
- A sequence of bytes processed as a unit
10The Data Plane of a Network
Destination NextHop
A left
B right
C left
- Forwarding Information Base (FIB)
- Basically a look-up table, each entry is a route
- Tests fields of packet and determines which
interface to send packet out
11The Data Plane of a Network
Permit A-gtB Drop C-gtB
- Packet Filter
- Specific to a single interface
- Tests fields of packet and determines whether to
permit or drop packet - Finer granularity than FIB can test more
fields, even target specific applications
12The Data Plane of a Network
- Many other mechanisms
- Queueing discipline
- Packet transformers (e.g., address translation)
13The Control Plane of a Network
Destination NextHop
A left
B right
C left
- Where do FIB entries come from?
- A distributed system called the Control Plane
- Control plane failures responsible for many of
the longest, hardest to debug outages!
14The Control Plane of a Network
Routing Process
FIB
- Routers run routing processes
15The Control Plane of a Network
Routing Process
Routing Process
Routing Process
FIB
FIB
FIB
- Adjacent processes exchange routing information
- Information format defined by routing protocol
- Many routing protocols BGP, OSPF, RIP, EIGRP
- Adjacent processes must use the same protocol
16The Control Plane of a Network
Routing Process
Routing Process
Routing Process
Destination NextHop
D left
FIB
FIB
FIB
- Routing protocols define logic for computing
routes - Combine all available information
- Pick best route for each destination
17Control Plane Creates Resiliency
Routing Process
D left
D
D
Routing Process
Routing Process
D
D
D left
D left
18Control Plane Creates Resiliency
Routing Process
D right
D
Routing Process
Routing Process
D
D
D left
D left
19A Study of Operational Production Networks
- How complicated/simple are real control planes?
- What is the structure of the distributed system?
- Use reverse-engineering methodology
- There are few or no documents
- The ones that exist are out-of-date
- Anonymized configuration files for 31 active
networks (gt8,000 configuration files) - 6 Tier-1 and Tier-2 Internet backbone networks
- 25 enterprise networks
- Sizes between 10 and 1,200 routers
- 4 enterprise networks significantly larger than
the backbone networks
20Excerpts from a Router Configuration File
- interface Ethernet0
- ip address 6.2.5.14 255.255.255.128
- interface Serial1/0.5 point-to-point
- ip address 6.2.2.85 255.255.255.252
- ip access-group 143 in
- frame-relay interface-dlci 28
- router ospf 64
- redistribute connected subnets
- redistribute bgp 64780 metric 1 subnets
- network 66.251.75.128 0.0.0.127 area 0
- router bgp 64780
- redistribute ospf 64 match route-map
8aTzlvBrbaW - neighbor 66.253.160.68 remote-as 12762
- neighbor 66.253.160.68 distribute-list 4 in
access-list 143 deny 1.1.0.0/16 access-list 143
permit any route-map 8aTzlvBrbaW deny 10 match
ip address 4 route-map 8aTzlvBrbaW permit 20
match ip address 7 ip route 10.2.2.1/16 10.2.1.7
21Size of Configuration Files in One Network
2000
Lines in config file
1000
0
881
0
Router ID (sorted by file size)
22Routing Processes Implement Policy
Routing Process
Routing Process
Routing Process
A
A,B
FIB
FIB
FIB
R1
R2
R3
- Extensive use of policy commands to filter routes
- Prevent some hosts from communicating security
policy - Limit access to short-cut links resource policy
23Packet Filters Implement Policy
- Packet filters used extensively throughout
networks - Protect routers from attack
- Implement reachability matrix
- Define which hosts can communicate
- Localize traffic, particularly multicast
24Mechanisms for Action at a Distance
A
Routing Process
Routing Process
Routing Process
Atag12
Atag12
Tag?
FIB
FIB
FIB
R1
R2
R3
- Policy often implemented by tagging routes on one
router - And testing for tag at another router
25Multiple Interacting Routing Processes
Client
Server
26The Routing Instance Graph of a 881 Router
Network
27Take Away Points
- Networks deal with both creating connectivity
- and preventing it
- Networks controlled by complex distributed
systems - Must understand system to understand behavior
- Focusing on individual protocols is not enough
- Composition of protocols is important and complex
- Developed abstractions to model routing design
- Routing Process Graph accurately model design
- Routing Instance abstracts away details
- Reverse-engineer routing design from configs
28Outline
- What do networks look like today?
- New approach to predicting network behavior
- Frame the problem of reachability analysis
- Sketch algebra for predicting reachability
- A new architecture for controlling networks
29Reachability
A
B
j
i
- Can A send a packet to B?
- Depends on routing protocols, advertised routes,
policies, packet filters, ... - Predicting reachability is key to network
survivability and security
30Reachability
A
B
j
i
- We focus on two types of policy
- Survivability Certain packets should always be
permitted, under all possible network states - Security Certain packets should never be
permitted, under all possible network states
31Reachability Example
R1
R2
Chicago (chi)
New York (nyc)
Data Center
Front Office
R5
R4
R3
- Two locations, each with data center front
office - All routers exchange routes over all links
32Reachability Example
R1
R2
Chicago (chi)
New York (nyc)
Data Center
Front Office
R5
R4
R3
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
33Reachability Example
Packet filter Drop nyc-FO -gt Permit
R1
R2
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
34Reachability Example
Packet filter Drop nyc-FO -gt Permit
R1
R2
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
- A new short-cut link added between data centers
- Intended for backup traffic between centers
35Reachability Example
Packet filter Drop nyc-FO -gt Permit
R1
R2
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
- Oops new link lets packets violate security
policy! - Routing changed, but
- Packet filters dont update automatically
36Reachability Example
Packet filter Drop nyc-FO -gt Permit
R2
R1
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
- Typical response add more packet filters to
plug the holes in security policy
37Reachability Example
Drop nyc-FO -gt
R2
R1
chi
Data Center
Front Office
R5
nyc
Drop chi-FO -gt
R4
R3
- Packet filters have surprising consequences
- Consider a link failure
- chi-FO and nyc-FO still connected
38Reachability Example
Drop nyc-FO -gt
R2
R1
chi
Data Center
Front Office
R5
nyc
Drop chi-FO -gt
R4
R3
- Network has less survivability than topology
suggests - chi-FO and nyc-FO still connected
- But packet filter means no data can flow!
- Probing the network wont predict this problem
39State of the Art in Reachability Analysis
- Build the network, try sending packets
- ping, traceroute, monitoring tools
- Only checks paths currently selected by routing
protocols - Cannot be used for what if analysis
- Our goal Static Reachability Analysis
- Predict reachability over multiple scenarios
through analysis of router configuration files
40Predicting Reachability
- How can we formalize the reachability provided by
a network? - The set of packets the network will carry from
router i to router j - A function of the forwarding state s
- s represents the contents of each FIB
- Ri,j(s) is the instantaneous reachability
Ri,j(s)
j
i
41Computing Reachability
Packets allowed along path p
The set of all paths from i to j
R1
R2
F2,1(s)
F1,2(s)
F2,3(s)
Fi,j(s) Set of packets permitted along link from
node i to node j in network state s
F3,4(s)
F3,2(s)
R4
F4,3(s)
R3
42Jointly Modeling the Effects of Packet Filters
and Routing
- Key Problem
- Fi,j(s) affected by routing and packet filters
- Key Insight
- Treat routes as dynamic packet filters
R1
R3
R2
Dest NextHop
A R3
B R1
C R3
43Bounding the Instantaneous Reachability
- Knowing the exact forwarding state s is
impractical - Knowing Ri,j(s) doesnt help much, anyway
- Want to predict behavior over a range of states
- Luckily, predicting behavior over set of all
possible states is easier than predicting
reachability for a single state
44Reachability Bounds
- Lower bound on Reachability
- Packets in this set never prohibited by network
- Upper bound on Reachability
- Packets not in this set always prohibited by
network
45Example Upper Bound Analysis
- Before short-cut link added
- After short-cut link added
46Example Lower Bound Analysis
Packet filter Drop nyc-FO -gt Permit
R2
chi
R1
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
- Before extra packet filters added
- After extra packet filters added
47Take Away Points
- We have defined an algebra for modeling
reachability - Packet filters, routing protocols, NAT
- GriffinBush validated RFC 2547 VPNs
- Status
- Algebra works on test cases
- Currently experimenting with production networks
- Algebras strength and weakness is static
analysis - Can validate that network meets static objectives
- Can have false positives
- Cannot design the network to meet objectives
- Cannot control network to obey dynamic objectives
48Outline
- What do networks look like today?
- New approach to predicting network behavior
- A new architecture for controlling networks
- New principles for network control
- New architecture embodying those principles
- Experimental validation
49Does Network Control Actually Matter?
- YES!
- Microsoft All services fell off the network for
23 hours due to misconfiguration of routers in
their network (2001) - Major ISP 50 of outages occur during planned
maintenance (2005) - IP networks have 2-3x the outages as
circuit-switched networks (2005)
50Three Principles forNetwork Control Management
- Network-level Objectives
- Express goals explicitly
- Security policies, QoS, egress point selection
- Do not bury goals in box-specific configuration
Reachability matrix Traffic engineering rules
Management Logic
51Three Principles forNetwork Control Management
- Network-wide Views
- Design network to provide timely, accurate info
- Topology, traffic, resource limitations
- Give logic the inputs it needs
Reachability matrix Traffic engineering rules
Management Logic
Read state info
52Three Principles forNetwork Control Management
- Direct Control
- Allow logic to directly set forwarding state
- FIB entries, packet filters, queuing parameters
- Logic computes desired network state, let it
implement it
Reachability matrix Traffic engineering rules
Write state
Management Logic
Read state info
53Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Decision Plane
- All management logic implemented on centralized
servers making all decisions - Decision Elements use views to compute data plane
state that meets objectives, then directly writes
this state to routers
54Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Dissemination Plane
- Provides a robust communication channel to each
router - May run over same links as user data, but
logically separate and independently controlled
55Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Discovery Plane
- Each router discovers its own resources and its
local environment - E.g., the identity of its immediate neighbors
56Overview of the 4D Architecture
Network-level objectives
Decision
Dissemination
Direct control
Network-wide views
Discovery
Data
- Data Plane
- Spatially distributed routers/switches
- No need to change todays technology
57Control Management Today
- Management Plane
- Figure out what is happening in network
- Decide how to change it
Shell scripts
Traffic Eng
Planning tools
Databases
Config files
SNMP
netflow
OSPF
- Data Plane
- Distributed routers
- Forwarding, filtering, queueing
- Based on FIB or labels
Packet filters
58Good Abstractions Reduce Complexity
Management Plane
Configs
Decision Plane
Control Plane
FIBs, ACLs
FIBs, ACLs
Dissemination
Data Plane
Data Plane
- All decision making logic lifted out of control
plane - Eliminates duplicate logic in management plane
- Dissemination plane provides robust communication
to/from data plane routers
59Three Key Questions
- Could the 4D architecture ever be deployed?
- Is the 4D architecture feasible?
- Can the 4D architecture actually simplify network
control and management?
60Deployment of the 4D Architecture
- Pre-existing industry trend towards separating
router hardware from software - IETF FORCES, GSMP, GMPLS
- SoftRouter Lakshman, HotNets04
- Incremental deployment path exists
- Individual networks can upgrade to 4D and gain
benefits - Small enterprise networks have most to gain
61The Feasibility of the 4D Architecture
- We designed and built a prototype of the 4D
- Decision plane
- Contains logic to simultaneously compute routes
and enforce reachability matrix - Multiple Decision Elements per network, using
simple election protocol to pick master - Dissemination plane
- Uses source routes to direct control messages
- Extremely simple, but can route around failed
data links
62Performance of the 4D Prototype
- Evaluated using Emulab (www.emulab.net)
- Linux PCs used as routers (650 800MHz)
- Tested on 9 enterprise network topologies (10-100
routers each) - Recovers from single link failure in lt 300 ms
- lt 1 s response considered excellent
- Survives failure of master Decision Element
- New DE takes control within 1 s
- No disruption unless second fault occurs
- Gracefully handles complete network partitions
- Less than 1.5 s of outage
634D Makes Network Management Control Error-proof
Packet filter Drop nyc-FO -gt Permit
R1
R2
chi
Data Center
Front Office
Packet filter Drop chi-FO -gt Permit
R5
nyc
R4
R3
chi-DC
chi-FO
nyc-DC
nyc-FO
chi-DC
chi-FO
nyc-DC
nyc-FO
64Prohibiting Packets from chi-FO to nyc-DC
654D Makes Network Management Control Error-proof
Drop nyc-FO -gt
R2
R1
chi
Data Center
Front Office
R5
nyc
Drop chi-FO -gt
R4
R3
66Allowing Packets from chi-FO to nyc-FO
67Related Work
- Driving network operation from network-wide views
- Traffic Engineering
- Traffic Matrix computation
- Centralization of decision making logic
- Routing Control Point Feamster
- Path Computation Element Farrel
- Signaling System 7 Ma Bell
68Take Aways
- No need for complicated distributed system in
control plane do away with it! - 4D Architecture a promising approach
- Power of solution comes from
- Colocating all decision making in one plane
- Providing that plane with network-wide views
- Directly express solution by writing forwarding
state - Benefits
- Coordinated state updates ! better reliability
- Separates network issues from distributed systems
issues
69Summary
- Networks must meet many different types of
objectives - Security, traffic engineering, robustness
- Today, objectives met using control plane
mechanisms - Results in complicated distributed system
- Ripe with opportunities to set time-bombs
- Predicting static properties is possible, but
difficult - Refactoring into a 4D Architecture very
promising - Separates network issues from reliability issues
- Eliminates duplicate logic and simplifies network
- Enables new capabilities, like joint control
70Questions?
71Backup Slides
72Computing Reachability Bounds
- Problem reduced to estimating all routes
potentially in routing table (FIB) of each router - Much easier than predicting exactly which routes
will be in FIB
73How to Organize the Decision Plane?
- We have exposed the network control logic --- now
what? - Need a way to structure that logic
- Mutual optimization of multiple objectives
- Potentially mutually exclusive
- Each objective has different time constants
- Multiple objectives may affect the same bit of
data-plane state
74Future Directions
- 4D in different network contexts
- Ethernet networks
- Mixed networks circuit- and packet-switched
- Include services in the 4D
- Domain Name Service
- HTTP Proxies and load balancers
75Reverse-Engineering Overview
Configuration files
Find links
Construct Layer 3 Topology
Find adjacent routing processes
Construct Routing Process Graph
Condense adjacent routing processes
AS2
Construct Routing Instance Graph
OSPF 1
OSPF 2
BGP AS1
76Reconstruct the Layer 3 Topology
Internet
Router 1 Config
Router 2 Config
interface Serial1/0.5 ip address 1.1.1.1
255.255.255.252 .
interface Serial2/1.5 ip address 1.1.1.2
255.255.255.252 .
77Abstract to a Routing Instance Graph
AS2
Policy1
Policy2
OSPF 1
OSPF 2
BGP AS1
- Pick an unassigned Routing Process
- Flood fill along process adjacencies, labeling
processes - Repeat until all processes assigned to an
Instance
78Textbook Routing Design for Enterprise Networks
EBGP
EBGP
- Border routers speak eBGP to external peers
- BGP selects a few key external routes to
redistribute into OSPF - 7 of 25 enterprise networks follow this pattern
AS2
OSPF
BGP AS 1
AS3
79Reality A Diversity of Unusual Routing Designs
Rest of the World
BGP AS 2
BGP AS 1
BGP AS 3
BGP AS 4
BGP AS 5
- Network broken up into compartments, each with
only 1 to 4 routers - Each compartment has its own AS number
- Hub and spoke logical topology
- Why? Lots of control over how spokes communicate
80Reality A Diversity of Unusual Routing Designs
Rest of the World
BGP AS 1
BGP AS 2
EIGRP
EIGRP
EIGRP
Rest of the World
BGP AS 3
BGP AS 4
- Network broken up into many compartments, each
running EIGRP, some with 400 routers - BGP used to filter routes passed between
compartments - Compartments themselves pass information between
BGP speakers - Why? Little need for IBGP few routers speak
BGP Lots of control over how packets move
between compartments
81Link Down
82Reconvergence Time UnderSingle Link Failure
83Reconvergence Time When Master DE Crashes
84Reconvergence Time WhenNetwork Partitions
85Reconvergence Time WhenNetwork Partitions
86Slides in Progressor Looking for a Place to go
87Separation of Issues
- The 4D Architecture separates issues
- Networking logic goes into decision plane
88Dissemination Plane
- Make clear that dissem paths can use same
physical links, but different routing - Discovery and dissem packets can be independent
of data-plane (e.g. IP) - IP is very configuration intensive (addresses,
etc) so we avoid it whenever possible
89Questions
- What if I want to take a bunch of hosts and stick
them together into a small network? Havent you
made this common case terrifically hard? - Today, Id use static routes its neither
common nor easy - In the 4D model, what do I do?
- DE co-located on the host
- Doesnt talk to any other DEs or routers
90Problems with State of the Art
- Today Network behavior determined by multiple
interacting distributed programs, written in
assembly language -
- No way to visualize or describe routing design
- Impossible to establish linkage between
configurations and network objectives - Only a few textbook routing designs are widely
known