Title: IP Network Management
1IP Network Management
2Overview
- Introduction
- Abstractions
- IP network components
- IP network management protocols
- Pulling it all together
- An alternative approach
3Overview
- Introduction
- Whats it all about then?
- Abstractions
- IP network components
- IP network management protocols
- Pulling it all together
- An alternative approach
4What is network management?
- One point-of-view a large field full of acronyms
- EMS, TMN, NE, CMIP, CMISE, OSS, AN.1, TL1, EML,
FCAPS, ITU, ... - (Dont ask me what all of those mean, I dont
care!) - From question.com
- In 1989, a random of the journalistic persuasion
asked hacker Paul Boutin What do you think will
be the biggest problem in computing in the 90s?
Pauls straight-faced response There are only
17,000 three-letter acronyms. - We will ignore most of them ?
5What is network management?
- Computer networks are considered to have three
operating timescales - Data packet forwarding µs, ms
- Control flows/connections secs, mins
- Management aggregates, networks hours, days
- so were concerned with the network rather
than particular devices or protocols - Standardization is key!
6Overview
- Introduction
- Abstractions
- ISO FCAPS, TMN EMS, ATM
- IP network components
- IP network management protocols
- Pulling it all together
- An alternative approach
7ISO FCAPS functional separation
- Fault
- Recognize, isolate, correct, log faults
- Configuration
- Collect, store, track configurations
- Accounting
- Collect statistics, bill users, enforce quotas
- Performance
- Monitor trends, set thresholds, trigger alarms
- Security
- Identify, secure, manage risks
8TMN EMS administrative separation
- Telecommunications Management Network
- Element Management System
- ...simple but elegant... (!)
- (my emphasis)
- (often the two go together)
- NEL network elements (switches, transmission
systems) - EML element management (devices, links)
- NML network management (capacity, congestion)
- SML service management (SLAs, time-to-market)
- BML business management (RoI, market share, blah)
9The B-ISDN reference model
- Asynchronous Transfer Mode cube
- See IAP lectures, maybe ?
- Plane management
- The whole network
- vs layer management
- Specific layers
- Topology
- Configuration
- Fault
- Operations
- Accounting
- Performance
management plane
user plane
control plane
higher layers
higher layers
plane management
layer management
ATM adaptation layer
ATM layer
physical layer
10Network management
- Models of general communication networks
- Tend to be quite abstract and exceedingly
tedious! - Many practitioners still seem excited about OO
programming, WIMP interfaces, etc - probably because implementation is hard due to
so many excessively long and complex standards! - My view basic need-to-know requirements are
- What should be happening? c
- What is happening? f, p, a
- What shouldnt be happening? f, s
- What will be happening? p, a
11Network management
- Well concentrate on IP networks
- Still acronym city ICMP, SNMP, MIB, RFC ?
- Sample size 102 routers, 105 hosts
- Well concentrate on the network core
- Routers, not hosts
- Well ignore service management
- DNS, AD, file stores, etc
12Overview
- Introduction
- Abstractions
- IP network components
- IP, networks, routers
- IP network management protocols
- Pulling it all together
- An alternative approach
13IP primer (you probably know all this)
- Destination-routed packets no connections
- Time-to-live field allow removal of looping
packets - Routers forward packets based on routeing tables
- Tables populated by routeing protocols
- Routers and protocols operate independently
- although protocols aim to build consistent state
- RFCs standards
- Often much looser semantics than e.g. ISO, ITU
standards - Compare for example OSPF RFC2327 and IS-IS
RFC1142, RFC1195, two link-state routeing
protocols
14So, how do you build an IP network?
1m? 2m? for a new, populated, backbone router!
- Buy (lease) routers
- Buy (lease) fibre
- Connect them all together
- Configure routers
- Configure end-systems
Wayleaves Be a landowner!
Correctly. For now.
Mwuhahaha.
Someone elses can of worms.
15Multiple router flavours
A sample taxonomy
- Core
- OC-12 (622Mbps) and up (to OC-768 40Gbps)
- Big, fat, fast, expensive
- E.g. Cisco HFR, Juniper T-640
- HFR 1.2Tbps each, interconnect up to 72 giving
92Tbps, start at 450k - Transit/Peering-facing
- OC-3 and up, good GigE density
- ACLs, full-on BGP, uRPF, accounting
- Customer-facing
- FR/ATM/
- Feature set as above, plus fancy queues, etc
- Broadband aggregator
- High scalability sessions, ports, reconnections
- Feature set as above
- Customer-premises (CPE)
- 100Mbps, maybe
- NAT, DHCP, firewall, wireless, VoIP,
- Low cost, low-end, perhaps just software on a PC
16Network design
- Whose network?
- ISPs, IXs, enterprise, campus
- POPs, DCs
- Many designs flat, hierarchical, hybrids,
multiple scales - Many constraints
- Business
- Backwards compatibility. Who to connect. Peering.
- Technology
- Power directly (24x7 operation) and indirectly
(cooling) - Port density vs. raw bandwidth
- Software reliability
- Hardware/software capability
- Addressing schemes for scalability, summarization
- Cant run feature X with feature Y on vendor C in
network size N - Connectivity/resiliency
- All core routers connect to at least 2 other
core routers - All edge routers connect to at least 2 core
routers
17Router configuration
- Initialization
- Name the router, setup boot options, setup
authentication options - Configure interfaces
- Loopback, ethernet, fibre, ATM
- Subnet/mask, filters, static routes
- Shutdown (or not), queueing options, full/half
duplex - Configure routeing protocols (OSPF, BGP, IS-IS,
) - Process number, addresses to accept routes from,
networks to advertise - Access lists, filters, ...
- Numeric id, permit/deny, subnet/mask, protocol,
port - Route-maps, matching routes rather than data
traffic - Other configuration aspects traps, syslog, etc
- (Oh, and switch configuration is about as painful)
18Router configuration fragments
hostname FOOBAR ! boot system flash
slot0a-boot-image.bin boot system flash
bootflash logging buffered 100000
debugging logging console informational aaa
new-model aaa authentication login default
tacacs local aaa authentication login consoleport
none aaa authentication ppp default if-needed
tacacs aaa authorization network tacacs ! ip
tftp source-interface Loopback0 no ip
domain-lookup ip name-server 10.34.56.78 ! ip
multicast-routing ip dvmrp route-limit 7000 ip
cef distributed
interface Loopback0 description
router-1.network.corp.com ip address 10.65.21.43
255.255.255.255 ! interface FastEthernet0/0/0
description Link to New York ip address
10.65.43.21 255.255.255.128 ip access-group 175
in ip helper-address 10.65.12.34 ip pim
sparse-mode ip cgmp ip dvmrp accept-filter 98
neighbor-list 99 full-duplex ! interface
FastEthernet4/0/0 no ip address ip access-group
183 in ip pim sparse-mode ip cgmp shutdown
full-duplex
router ospf 2 log-adjacency-changes
passive-interface FastEthernet0/0/0
passive-interface FastEthernet0/1/0
passive-interface FastEthernet1/0/0
passive-interface FastEthernet1/1/0
passive-interface FastEthernet2/0/0
passive-interface FastEthernet2/1/0
passive-interface FastEthernet3/0/0 network
10.65.23.45 0.0.0.255 area 1.0.0.0 network
10.65.34.56 0.0.0.255 area 1.0.0.0 network
10.65.43.0 0.0.0.127 area 1.0.0.0
access-list 24 remark Mcast ACL access-list 24
permit 239.255.255.254 access-list 24 permit
224.0.1.111 access-list 24 permit 239.192.0.0
0.3.255.255 access-list 24 permit 232.192.0.0
0.3.255.255 access-list 24 permit 224.0.0.0
0.0.0.255 access-list 1011 deny 0000.0000.0000
ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1
2 eq 0x42 access-list 1011 permit 0000.0000.0000
ffff.ffff.ffff 0000.0000.0000 ffff.ffff.ffff
tftp-server slot1some-other-image.bin tacacs-serv
er host 10.65.0.2 tacacs-server key xxxxxxxx rmon
event 1 trap Trap1 description "CPU
Utilizationgt75" owner config rmon event 2 trap
Trap2 description "CPU Utilizationgt95" owner
config
19Router configuration
- Lots of large, fragile text files
- 00s/000s routers, 00s/000s lines per config
- Errors are hard to find and have non-obvious
results - Router configuration also editable on-line
- Order matters!
- How to keep track of them all?
- Naming schemes, directory trees, CVS, ssh upload
and atomic commit to router - Perhaps even a proper database
- State of the art is pretty basic
- Few tools to check consistency, design goals
- Generally generate configurations from templates
and have human-intensive process to control
access to running configs - Topic of current research Feamster et al
This counts as advanced!
20Overview
- Introduction
- Abstractions
- IP network components
- IP network management protocols
- ICMP, SNMP, NetFlow
- Pulling it all together
- An alternative approach
21ICMP
- Internet Control Message Protocol RFC792
- IP protocol 1
- In-band control
- Variety of message types
- echo/echo reply PING (packet internet groper)
- time exceeded TRACEROUTE
- destination unreachable, redirect
- source quench
22Ping (Packet INternet Groper)
- Test for liveness
- also used to measure (round-trip) latency
- Send ICMP echo
- Valid IP host RFC1122, RFC1123 must reply with
ICMP echo response - Subnet PING?
- Useful but often not available/deprecated
- ACK implosion could be a problem
- RFCs standards
23Traceroute
- Which route do my packets take to their
destination? - Send UDP packets with increasing time-to-live
values - Compliant IP host must respond with ICMP time
exceeded - Triggers each host along path to so respond
- Not quite that simple
- One router, many IP addresses which source
address? - Router control processor, inbound or outbound
interface - Asymmetric routes (return path ! outbound path)
- Routes change
- Do we want full-mesh host-host routes anyway?!
- Size of data set, amount of probe traffic
- This is topology, what about load on links?
24SNMP
- Protocol to manage information tables at devices
- Provides get, set, trap, notify operations
- get, set read, write values
- trap signal a condition (e.g. threshold
exceeded) - notify reliable trap
- Complexity mostly in the MIB design
- Some standard tables, but many vendor specific
- Non-critical, so often tables populated
incorrectly - Many tens of MIBs (thousands of lines) per device
- Different versions, different data, different
semantics - Yet another configuration tracking problem
- Inter-relationships between MIBs
25IPFIX
- IETF working group
- Export of flow based data out of IP network
devices - Developing suitable protocol from Cisco NetFlow
v9 - RFC3954, RFC3955
- Statistics reporting
- Setup template
- Send data records matching template
- Many variables
- Packet/flow counters, rule matches, quite
flexible
26Overview
- Introduction
- Abstractions
- IP network components
- IP network management protocols
- Pulling it all together
- Network mapping, statistics gathering, control
- An alternative approach
27An hypothetical NMS
- GUI around ICMP (ping, traceroute), SNMP, etc
- Recursive host discovery
- Broadcast ping, ARP, default gateway start
somewhere - Recursively SNMP query for known hosts/connected
networks - Ping known hosts to test liveness
- Iterate
- Display topology allow drill-down to
particular devices - Configure and monitor known devices
- Trap, Netflow, syslog message destinations
- Counter thresholds, CPU utilization threshold,
fault reporting - Particular faults or fault patterns
- Interface statistics and graphs (MRTG)
28NOC, NOC. Calling ATT
29What are they all looking at?
http//www.stat.ee.ethz.ch/mrtg/
30An hypothetical NMS
- All very straightforward? No, not really
- A lot of software engineering corner cases,
traceroute interpretation, NATs, etc - Correctness
- MIBs may contain rubbish
- Can only view inside your network anyway
- Tunnelled, encrypted protocols becoming prevalent
- Efficiency
- Rate pacing discovery traffic ping
implosion/explosion - SNMP overloading router CPUs
- Using NMSs also not straightforward
- How to setup correct thresholds?
- How to decide when something bad has happened?
- How to present (or even interpret) reams and
reams of data?
31Overview
- Introduction
- Abstractions
- IP network components
- IP network management protocols
- Pulling it all together
- An alternative approach
- From the edges
32Anemone
- An endsystem network management platform
- Collect flow information from endsystems, and
- Combine with topology information from routeing
protocols - Endsystems have more information about their
traffic network devices - No router support required
- A platform to support many applications
- Currently concentrating on managed networks
- E.g. governments, enterprises, etc
- High complexity, high value
- High degree of endsystem control
33Applications
- Real-time and historical analysis
- Current topology ingress, egress flows gives
global picture of network behaviour - Capacity planning, anomaly detection
- Modelling what if scenarios
- Plug into a simulator back-end
- What happens to the network if we move all our
Exchange servers to a single data centre? - Automatic configuration
- Close the loop enable network to meet dynamic
SLAs - Reconfigure network to track e.g. time of day
load changes
34Challenges
- How do you instrument an entire network? Do you
need to instrument all endsystems? - What network information should be captured and
stored? - How do you access data stores distributed across
a large network (ca. 300,000 nodes)?
1 coverage gives 99.999 bytes and flows
Flow data augmented with application and user
context
Use distributed query system
35Summary
- Introduction
- What is network management?
- Abstractions
- ISO FCAPS, TMN EMS, ATM
- IP network components
- IP, networks, routers
- IP network management protocols
- ICMP, SNMP, etc
- Pulling it all together
- Outline of a network management system
- An alternative approach from the edges
36The end
- Questions
- Answers?
- http//www.cisco.com/
- http//www.routergod.com/
- http//www.ietf.org/
- http//ipmon.sprintlabs.com/pyrt/
- http//www.nanog.org/
37Backup slides
- Internet routeing
- OSPF
- BGP
38Internet routeing
- Q how to get a packet from node to destination?
- A1 advertise all reachable destinations and
apply a consistent cost function (distance
vector) - A2 learn network topology and compute consistent
shortest paths (link state) - Each node (1) discovers and advertises
adjacencies (2) builds link state database (3)
computes shortest paths - A1, A2 Forward to next-hop using
longest-prefix-match
39OSPF (link state routeing)
- Q how to route given packet from any node to
destination? - A learn network topology compute shortest paths
- For each node
- Discover adjacencies (immediate neighbours)
advertise - Build link state database (network topology)
- Compute shortest paths to all destination
prefixes - Forward to next-hop using longest-prefix-match
(most specific route)
40BGP (path vector routeing)
- Q how to route given packet from any node to
destination? - A neighbours tell you destinations they can
reach pick cheapest option - For each node
- Receive (destination, cost, next-hop) for all
destinations known to neighbour - Longest-prefix-match among next-hops for given
destination - Advertise selected (destination, cost?,
next-hop') for all known destinations - Selection process is complicated
- Routes can be modified/hidden at all three stages
- General mechanism for application of policy
41(No Transcript)
42Anemone where are we now?
- Three major components
- Flow collection
- Route collection
- Distributed database
- Building prototypes, simulating system
43Data collection
- Flow collection
- Hosts track active flows
- Using low overhead event posting infrastructure,
ETW - Built prototype device driver provider
user-space consumer - Used packet traces for feasibility study on
(client, server) - Peaks at (165, 5667) live and (39, 567) active
flows per sec - Route collection
- OSPF is link-state passively collect link state
adverts - Extension of my work at Sprint (for IS-IS and
BGP) also been done at ATT (NSDI04 paper)
44The distributed database
- Logically contains
- Traffic flow matrix (bandwidths), srcs dsts
- each entry annotated with current route from src
to dst - N.B. src/dst might be e.g. (IP end-point,
application) - Large dynamic data set suggests aggregation
- Related work
- distributed, continuous query, temporal
databases - Sensor networks
- Potential starting points Astrolabe or SDIMS
(SIGCOMM04) - Where/what/how much to aggregate?
- Is data read- or write-dominated?
- Which is more dynamic, flow or topology data?
- Can the system successfully self-tune?
45The distributed database
- Construct traffic matrix from flow monitoring
- Hosts can supply flows they source and sink
- Only need a subset of this data to get complete
traffic matrix - Construct topology from route collection
- OSPF supplies topology ? routes
- Wish to be able to answer queries like
- Who are the top-10 traffic generators?
- Easy to aggregate, dont care about topology
- What is the load on link l ?
- Can aggregate from hosts, but need to know routes
- What happens if we remove links lm ?
- Interaction between traffic matrix, topology,
even flow control
46The distributed database
- Building simulation model
- OSPF data gives topology, event list, routes
- Simple load model to start with (load
subnets) - Precedence matrix (from SPF) reduces flow-data
query set - Can we do as well/better than e.g. NetFlow?
- Accuracy/coverage trade-off
- How should we distribute the DB?
- Just OSPF data? Just flow data? A mixture?
- How many levels of aggregation?
- How many nodes do queries touch?
- What sort of API is suitable?
- Example queries for sample applications