Anemone: Edgebased network management - PowerPoint PPT Presentation

1 / 31

About This Presentation

Title:

Anemone: Edgebased network management

Description:

Anemone platform. Studying feasibility and building prototypes. Data collection: flows ... The Anemone platform. Wish to be able to answer queries like ' ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 32

Provided by: mort98

Category:

more less

Transcript and Presenter's Notes

Title: Anemone: Edgebased network management

1
AnemoneEdge-based network management

Mort (Richard Mortier)
MSR-Cambridge
December 2004

2
Network management

is the process of monitoring and controlling a
large complex distributed system of dumb devices
where failures are common and resources scarce
Enterprise networks are large but closely managed
Contrast with the Internet or university campus
networks
No-one has the big picture!
Internet routeing uses distributed protocols
Current management tools all consider local info
Patchy SNMP support, configuration issues,
sampling artefacts, tools generate CPU and
network load

3
Anemone

Building edge-based network management platform
Collect flow information from hosts, and
Combine with topology information from routeing
protocols
Enable visualization, analysis, simulation,
control
Avoid problems of not-quite-standard interfaces
Management support is typically non-critical
(i.e. buggy ?) and not extensively tested for
inter-operability
Do the work where resources are plentiful
Hosts have lots of cycles and little traffic
(relatively)
Protocol visibility see into tunnels, IPSec, etc

4
Problem context Enterprise networks

Large
105 edge devices, 103 network devices
Geographically distributed
Multiple continents, 102 countries
Tightly controlled
IT department has (nearly) complete control over
user desktops and network connected equipment

5
Talk outline

System outline
What would it be good for?
In more detail
Research issues

6
System outline
Packets
Routeing protocol
Flows
Topology
Traffic matrix
Set of routes
Anemone platform
Simulator
Control
Visualize Simulate
7
Where is my traffic going today?

Pictures of current topology and traffic
Routesflowsforwarding rules ? BIG PICTURE
In fact, where did my traffic go yesterday?
Keep historical data for capacity planning, etc
A platform for anomaly detection
Historical data suggests normality, live
monitoring allows anomalies to be detected

8
Where might my traffic go tomorrow?

Plug into a simulator back-end
Discrete event simulator, flow allocation solver
Run multiple what-if scenarios
failures
reconfigurations
technology deployments
E.g. What happens if we coalesce all the
Exchange servers in one data-centre?

9
Where should my traffic be going?

Close the loop compute link weights to implement
policy goals
Recompute on order of hours/days
Allows more dynamic policies
Modify network configuration to track e.g. time
of day load changes
Make network more efficient (cheaper)?

10
Where are we now?

Three major components
Flow collection
Route collection
Anemone platform
Studying feasibility and building prototypes

11
Data collection flows

Hosts track active flows
Using ETW, low overhead event posting
infrastructure
Built prototype device driver provider
user-space consumer
Used 24h packet traces from (client, server) for
feasibility study
Peaks at (165, 5667) live and (39, 567) active
flows per sec

12
(No Transcript)
13
(No Transcript)
14
Data collection routes

OSPF is link-state so collect link state adverts
Similar to Sprint IS-IS collection
Was also done at ATT (NSDI04 paper)
Completely passive
Modulo configuration ?
Process data to recover network events and
topology
Data collected for (local, backbone) areas (20
days)
LSA DB size (700, 1048) LSAs (21, 34) kB
Event totals (2526, 3238) events (5.3, 6.7)
evts/hr
Small, generally stable with bursts of activity

15
NB Spike to 100 from initial DB collection
truncated for readability
16
(No Transcript)
17
complete dataset
steady state
35 mins LSRefreshTimeCheckAge?
30 mins LSRefreshTime?
10 mins data ca. 25/Nov?
12 mins RouterDeadInterval?
18
The Anemone platform

Distributed database, logically containing
Traffic flow matrix (bandwidths), srcs dsts
Hosts can supply flows they source and sink
Only need a subset of this data to get complete
traffic matrix
each entry annotated with current route, src to
dst
Note src/dst might be e.g. (IP end-point,
application)
OSPF supplies topology ? routes
Where/what/how much to distribute/aggregate?
Is data read- or write-dominated?
Which is more dynamic, flow or topology data?
Can the system successfully self-tune?

19
The Anemone platform

Wish to be able to answer queries like
Who are the top-10 traffic generators?
Easy to aggregate, dont care about topology
What is the load on link l?
Can aggregate from hosts, but need to know routes
What happens if we remove links lm?
Interaction between traffic matrix, topology,
even flow control
Related work
distributed, continuous query, temporal
databases
Sensor networks, Astrolabe, SDIMS, PHI

20
The Anemone platform

Building simulation model
OSPF data gives topology, event list, routes
Simple load model to start with (load
subnets)
Predecessor matrix (from SPF) reduces flow-data
query set
Can we do as well/better than e.g. NetFlow?
Accuracy/coverage trade-off
How should we distribute the data and by what
protocols?
Just OSPF data? Just flow data? A mixture?
How many levels of aggregation?
How many nodes do queries touch?
What sort of API is suitable?
Example queries for sample applications

21
Research issues

Corner cases
Scalability
Robustness, accuracy
Control systems

22
Research issues

Corner cases
Multi-homed hosts how best to define a flow
L4 routeing, NAT, proxy ARP, transparent proxies
(Solve using device config files, perhaps SNMP)
Scalability
Host measurement must not be intrusive (in terms
of packet latency, CPU load, network bandwidth)
Aggregators must elect themselves in such a way
that they do not implode under event load
What happens if network radically alters? E.g.
Extensive use of multicast
Connection patterns shift due to e.g. P2P
deployment

23
Research issues

Robustness
Network management had better still work as nodes
fail or the network partitions!
Accuracy in the face of late, partial information
By accident unmonitored hosts
By design aggregation, more detail about local
area
Inference of link contribution to cumulative
metrics, e.g. RTT
Network control modify link weights
How efficient is the current configuration
anyway?
What are plausible timescales to reconfigure?

24
Summary

Aim to build a coherent edge-based network
management platform using flow monitoring and
standard routeing protocols
Applications include visualization, simulation,
dynamic control
Research issues include
Scalability want to manage a 300,000 node
network
Robustness must work as nodes fail or network
partitions
Accuracy will not be able to monitor 100 of
traffic
Control systems use the data to optimize the
network in real-time, as well as just observe and
simulate

25
Current status

Submitted Networking 2005 paper
Prototype ETW provider/consumer driver
Studied feasibility of flow monitoring
Prototype OSPF collector topology
reconstruction
Investigating distributed database via
simulation
Query properties
System decomposition
Protocols for data distribution
Questions, comments?

26
Backup slides

SNMP
Internet routeing
OSPF
BGP
Security

27
SNMP

Protocol to manage information tables at devices
Provides get, set, trap, notify operations
get, set read, write values
trap signal a condition (e.g. threshold
exceeded)
notify reliable trap
Complexity mostly in the table design
Some standard tables, but many vendor specific
Non-critical, so often tables populated
incorrectly

28
Internet routeing

Q how to get a packet from node to destination?
A1 advertise all reachable destinations and
apply a consistent cost function (distance
vector)
A2 learn network topology and compute consistent
shortest paths (link state)
Each node (1) discovers and advertises
adjacencies (2) builds link state database (3)
computes shortest paths
A1, A2 Forward to next-hop using
longest-prefix-match

29
OSPF (link state routeing)