Anemone: Edgebased network management - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Anemone: Edgebased network management

Description:

Anemone platform. Studying feasibility and building prototypes. Data collection: flows ... The Anemone platform. Wish to be able to answer queries like ' ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 32
Provided by: mort98
Category:

less

Transcript and Presenter's Notes

Title: Anemone: Edgebased network management


1
AnemoneEdge-based network management
  • Mort (Richard Mortier)
  • MSR-Cambridge
  • December 2004

2
Network management
  • is the process of monitoring and controlling a
    large complex distributed system of dumb devices
    where failures are common and resources scarce
  • Enterprise networks are large but closely managed
  • Contrast with the Internet or university campus
    networks
  • No-one has the big picture!
  • Internet routeing uses distributed protocols
  • Current management tools all consider local info
  • Patchy SNMP support, configuration issues,
    sampling artefacts, tools generate CPU and
    network load

3
Anemone
  • Building edge-based network management platform
  • Collect flow information from hosts, and
  • Combine with topology information from routeing
    protocols
  • Enable visualization, analysis, simulation,
    control
  • Avoid problems of not-quite-standard interfaces
  • Management support is typically non-critical
    (i.e. buggy ?) and not extensively tested for
    inter-operability
  • Do the work where resources are plentiful
  • Hosts have lots of cycles and little traffic
    (relatively)
  • Protocol visibility see into tunnels, IPSec, etc

4
Problem context Enterprise networks
  • Large
  • 105 edge devices, 103 network devices
  • Geographically distributed
  • Multiple continents, 102 countries
  • Tightly controlled
  • IT department has (nearly) complete control over
    user desktops and network connected equipment

5
Talk outline
  • System outline
  • What would it be good for?
  • In more detail
  • Research issues

6
System outline
Packets
Routeing protocol
Flows
Topology
Traffic matrix
Set of routes
Anemone platform
Simulator
Control
Visualize Simulate
7
Where is my traffic going today?
  • Pictures of current topology and traffic
  • Routesflowsforwarding rules ? BIG PICTURE
  • In fact, where did my traffic go yesterday?
  • Keep historical data for capacity planning, etc
  • A platform for anomaly detection
  • Historical data suggests normality, live
    monitoring allows anomalies to be detected

8
Where might my traffic go tomorrow?
  • Plug into a simulator back-end
  • Discrete event simulator, flow allocation solver
  • Run multiple what-if scenarios
  • failures
  • reconfigurations
  • technology deployments
  • E.g. What happens if we coalesce all the
    Exchange servers in one data-centre?

9
Where should my traffic be going?
  • Close the loop compute link weights to implement
    policy goals
  • Recompute on order of hours/days
  • Allows more dynamic policies
  • Modify network configuration to track e.g. time
    of day load changes
  • Make network more efficient (cheaper)?

10
Where are we now?
  • Three major components
  • Flow collection
  • Route collection
  • Anemone platform
  • Studying feasibility and building prototypes

11
Data collection flows
  • Hosts track active flows
  • Using ETW, low overhead event posting
    infrastructure
  • Built prototype device driver provider
    user-space consumer
  • Used 24h packet traces from (client, server) for
    feasibility study
  • Peaks at (165, 5667) live and (39, 567) active
    flows per sec

12
(No Transcript)
13
(No Transcript)
14
Data collection routes
  • OSPF is link-state so collect link state adverts
  • Similar to Sprint IS-IS collection
  • Was also done at ATT (NSDI04 paper)
  • Completely passive
  • Modulo configuration ?
  • Process data to recover network events and
    topology
  • Data collected for (local, backbone) areas (20
    days)
  • LSA DB size (700, 1048) LSAs (21, 34) kB
  • Event totals (2526, 3238) events (5.3, 6.7)
    evts/hr
  • Small, generally stable with bursts of activity

15
NB Spike to 100 from initial DB collection
truncated for readability
16
(No Transcript)
17
complete dataset
steady state
35 mins LSRefreshTimeCheckAge?
30 mins LSRefreshTime?
10 mins data ca. 25/Nov?
12 mins RouterDeadInterval?
18
The Anemone platform
  • Distributed database, logically containing
  • Traffic flow matrix (bandwidths), srcs dsts
  • Hosts can supply flows they source and sink
  • Only need a subset of this data to get complete
    traffic matrix
  • each entry annotated with current route, src to
    dst
  • Note src/dst might be e.g. (IP end-point,
    application)
  • OSPF supplies topology ? routes
  • Where/what/how much to distribute/aggregate?
  • Is data read- or write-dominated?
  • Which is more dynamic, flow or topology data?
  • Can the system successfully self-tune?

19
The Anemone platform
  • Wish to be able to answer queries like
  • Who are the top-10 traffic generators?
  • Easy to aggregate, dont care about topology
  • What is the load on link l?
  • Can aggregate from hosts, but need to know routes
  • What happens if we remove links lm?
  • Interaction between traffic matrix, topology,
    even flow control
  • Related work
  • distributed, continuous query, temporal
    databases
  • Sensor networks, Astrolabe, SDIMS, PHI

20
The Anemone platform
  • Building simulation model
  • OSPF data gives topology, event list, routes
  • Simple load model to start with (load
    subnets)
  • Predecessor matrix (from SPF) reduces flow-data
    query set
  • Can we do as well/better than e.g. NetFlow?
  • Accuracy/coverage trade-off
  • How should we distribute the data and by what
    protocols?
  • Just OSPF data? Just flow data? A mixture?
  • How many levels of aggregation?
  • How many nodes do queries touch?
  • What sort of API is suitable?
  • Example queries for sample applications

21
Research issues
  • Corner cases
  • Scalability
  • Robustness, accuracy
  • Control systems

22
Research issues
  • Corner cases
  • Multi-homed hosts how best to define a flow
  • L4 routeing, NAT, proxy ARP, transparent proxies
  • (Solve using device config files, perhaps SNMP)
  • Scalability
  • Host measurement must not be intrusive (in terms
    of packet latency, CPU load, network bandwidth)
  • Aggregators must elect themselves in such a way
    that they do not implode under event load
  • What happens if network radically alters? E.g.
  • Extensive use of multicast
  • Connection patterns shift due to e.g. P2P
    deployment

23
Research issues
  • Robustness
  • Network management had better still work as nodes
    fail or the network partitions!
  • Accuracy in the face of late, partial information
  • By accident unmonitored hosts
  • By design aggregation, more detail about local
    area
  • Inference of link contribution to cumulative
    metrics, e.g. RTT
  • Network control modify link weights
  • How efficient is the current configuration
    anyway?
  • What are plausible timescales to reconfigure?

24
Summary
  • Aim to build a coherent edge-based network
    management platform using flow monitoring and
    standard routeing protocols
  • Applications include visualization, simulation,
    dynamic control
  • Research issues include
  • Scalability want to manage a 300,000 node
    network
  • Robustness must work as nodes fail or network
    partitions
  • Accuracy will not be able to monitor 100 of
    traffic
  • Control systems use the data to optimize the
    network in real-time, as well as just observe and
    simulate

25
Current status
  • Submitted Networking 2005 paper
  • Prototype ETW provider/consumer driver
  • Studied feasibility of flow monitoring
  • Prototype OSPF collector topology
    reconstruction
  • Investigating distributed database via
    simulation
  • Query properties
  • System decomposition
  • Protocols for data distribution
  • Questions, comments?

26
Backup slides
  • SNMP
  • Internet routeing
  • OSPF
  • BGP
  • Security

27
SNMP
  • Protocol to manage information tables at devices
  • Provides get, set, trap, notify operations
  • get, set read, write values
  • trap signal a condition (e.g. threshold
    exceeded)
  • notify reliable trap
  • Complexity mostly in the table design
  • Some standard tables, but many vendor specific
  • Non-critical, so often tables populated
    incorrectly

28
Internet routeing
  • Q how to get a packet from node to destination?
  • A1 advertise all reachable destinations and
    apply a consistent cost function (distance
    vector)
  • A2 learn network topology and compute consistent
    shortest paths (link state)
  • Each node (1) discovers and advertises
    adjacencies (2) builds link state database (3)
    computes shortest paths
  • A1, A2 Forward to next-hop using
    longest-prefix-match

29
OSPF (link state routeing)
  • Q how to route given packet from any node to
    destination?
  • A learn network topology compute shortest paths
  • For each node
  • Discover adjacencies (immediate neighbours)
    advertise
  • Build link state database (network topology)
  • Compute shortest paths to all destination
    prefixes
  • Forward to next-hop using longest-prefix-match
    (most specific route)

30
BGP (path vector routeing)
  • Q how to route given packet from any node to
    destination?
  • A neighbours tell you destinations they can
    reach pick cheapest option
  • For each node
  • Receive (destination, cost, next-hop) for all
    destinations known to neighbour
  • Select among all possible next-hops for given
    destination
  • Advertise selected (destination, cost?,
    next-hop') for all known destinations
  • Selection process is complicated
  • Routes can be modified/hidden at all three stages
  • General mechanism for application of policy

31
Security
  • Threat malicious/compromised host
  • Authenticate participants
  • Must secure route collector as if a router
  • Threat DoS on monitors
  • Difference between client under DoS and server?
  • Rate pace output from monitors
  • Threat eavesdropping
  • Standard IPSec/encryption solutions
Write a Comment
User Comments (0)
About PowerShow.com