MetaSimulation Design and Analysis for Large Scale Networks

1 / 45
About This Presentation
Title:

MetaSimulation Design and Analysis for Large Scale Networks

Description:

Memory efficient network protocol models. ROSS.Net: Big Picture. Design of Experiments Tool (DOT) ... Simulation: Memory Model. Key observations: ... –

Number of Views:23
Avg rating:3.0/5.0
Slides: 46
Provided by: david427
Category:

less

Transcript and Presenter's Notes

Title: MetaSimulation Design and Analysis for Large Scale Networks


1
Meta-Simulation Design and Analysis for Large
Scale Networks
  • David W Bauer Jr
  • Department of Computer Science
  • Rensselaer Polytechnic Institute

2
OUTLINE
  • Motivation
  • Contributions
  • Meta-simulation
  • ROSS.Net
  • BGP4-OSPFv2 Investigation
  • Simulation
  • Kernel Processes
  • Seven Oclock Algorithm
  • Conclusion

3
High-Level Motivation to gain varying degrees of
qualitative and quantitative
understanding of the behavior of
the system-under-test
objective as a quest for general
invariant relationships between
network parameters and protocol dynamics
4
  • Meta-Simulation capabilities to extract and
    interpret meaningful performance data from the
    results of multiple simulations
  • Individual experiment cost is high
  • Developing useful interpretations
  • Protocol performance modeling

Experiment Design Goal identify minimum
cardinality set of meta-metrics to maximally
model system
5
OUTLINE
  • Motivation
  • Contributions
  • Meta-simulation
  • ROSS.Net
  • BGP4-OSPFv2 Investigation
  • Simulation
  • Kernel Processes
  • Seven Oclock Algorithm
  • Conclusion

6
Contributions Meta-Simulation OSPF
Problem which meta-metrics are most important in
determining OSPF convergence?
Negligible metrics identified and isolated
Search complete model space
Step 1
Re-parameterize
Step 2
Our approach within 7 of Full Factorial using 2
orders of magnitude fewer experiments
Step 3
Re-scale
7
Contributions Meta-Simulation OSPF/BGP
Ability model BGP and OSPF control
plane Problem which meta-metrics are most
important in minimizing control plane dynamics
(i.e., updates)?
  • BO BGP-caused OSPF update
  • OB OSPF-caused BGP update
  • All updates belong to one of four categories
  • OO OSPF-caused OSPF (OO) update
  • BO BGP-caused OSPF update

Minimize total BOOB 15-25 better than other
metrics
Meta-Simulation Perspective complete view of
all domains
OB 50 of total updates BO 0.1 of total
updates
  • Optimized with respect to various metrics --
    equivalent to a particular management approach.
  • Importance of parameters differ for each metric.
  • For minimal total updates
  • Local perspectives are 20-25 worse than the
    global.
  • For minimal total interactions
  • 15-25 worse can happen with other metrics
  • OB updates are more important than BO updates
    (i.e. 0.1 vs. 50)

8
Contributions Simulation Kernel Process
  • Parallel Discrete Event Simulation

Optimistic Simulation Allow violations of
time-stamp order to occur, but detect them and
recover
Conservative Simulation Wait until it is safe to
process next event, so that events are processed
in time-stamp order
  • Benefits of Optimistic Simulation
  • Not dependant on network topology simulated
  • As fast as possible forward execution of events

9
Contributions Simulation Kernel Process
Problem parallelizing simulation requires 1.5
to 2 times more memory than sequential, and
additional memory requirement affects performance
and scalability
Decreased scalability as model size
increases due to increased memory required to
support model
4 Processors Used
Solution Kernel Processes (KPs) new data
structure supports parallelism, increases
scalability
Model Size Increasing
10
Contributions Simulation Seven Oclock
Problem distributing simulation requires
efficient global synchronization
Inefficient solution barrier synchronization
between all nodes while performing
computation Efficient solution pass messages
between nodes, and sycnhronize in background to
main simulation Seven Oclock Algorithm
eliminate message passing ? reduce cost from O(n)
or O(log n) to O(1)
11
OUTLINE
  • Motivation
  • Contributions
  • Meta-simulation
  • ROSS.Net
  • BGP4-OSPFv2 Investigation
  • Simulation
  • Kernel Processes
  • Seven Oclock Algorithm
  • Conclusion

12
ROSS.Net Big Picture
Goal an integrated simulation and experiment
design environment
ROSS.Net (simulation meta-simulation
Modeling
Protocol Models OSPFv2, BGP4, TCP Reno, IPv4,
etc
Measured topology data, traffic and router stats,
etc.
Measurement Data-sets (Rocketfuel)
13
ROSS.Net Big Picture
Meta-Simulation
  • Experiment design
  • Statistical analysis
  • Optimization heuristic search
  • Recursive Random Search
  • Sparse empirical modeling

ROSS.Net
Design of Experiments Tool (DOT)
Input Parameters
Output Metric(s)
Parallel Discrete Event Network Simulation
  • Optimistic parallel simulation
  • ROSS
  • Memory efficient network protocol models

Simulation
14
ROSS.Net Meta-Simulation Components
15
Meta-Simulation OSPF/BGP Interactions
  • Router topology from Rocketfuel tracedata
  • took each ISP map as a single OSPF area
  • Created BGP domain between ISP maps
  • hierarchical mapping of routers

ATTs US Router Network Topology
  • 8 levels of routers
  • Levels 0 and 1, 155Mb/s, 4ms delay
  • Levels 2 and 3, 45Mb/s, 4ms delay
  • Levels 4 and 5, 1.5Mb/s, 10ms delay
  • Levels 6 and 7, 0.5Mb/s, 10ms delay

16
Meta-Simulation OSPF/BGP Interactions
  • OSPF
  • Intra-domain, link-state routing
  • Path costs matter
  • Border Gateway Protocol (BGP)
  • Inter-domain, distance-vector, policy routing
  • Reachability matters
  • BGP decision-making steps
  • Highest LOCAL PREF
  • Lowest AS Path Length
  • Lowest origin type ( 0 iBGP, 1 eBGP, 2
    Incomplete)
  • Lowest MED
  • Lowest IGP cost
  • Lowest router ID

OSPF domain
eBGP connectivity
iBGP connectivity
17
Meta-Simulation OSPF/BGP Interactions
  • Intra-domain routing decisions can effect
    inter-domain behavior, and vice versa.
  • All updates belong to either of four categories
  • OSPF-caused OSPF (OO) update
  • OSPF-caused BGP (OB) update interaction
  • BGP-caused OSPF (BO) update interaction
  • BGP-caused BGP (BB) update

OB Update
Destination
10
8
Link failure or cost increase (e.g. maintenance)
18
Meta-Simulation OSPF/BGP Interactions
  • Intra-domain routing decisions can effect
    inter-domain behavior, and vice versa.
  • Identified four categories of updates
  • OO OSPF-caused OSPF update
  • BB BGP-caused BGP update
  • OB OSPF-caused BGP update interaction
  • BO BGP-caused OSPF update interaction

BO Update
Destination
eBGP connectivity becomes available
These interactions cause route changes to
thousands of IP prefixes, i.e. huge traffic
shifts!!
19
Meta-Simulation OSPF/BGP Interactions
  • Three classes of protocol parameters
  • OSPF timers, BGP timers, BGP decision
  • Maximum search space size 14,348,907.
  • RRS was allowed 200 trials to optimize (minimize)
    response surface
  • OO, OB, BO, BB, OBBO, ALL updates
  • Applied multiple linear regression analysis on
    the results

20
Meta-Simulation OSPF/BGP Interactions
  • Optimized with respect to OBBO response surface.
  • BGP timers play the major role, i.e. 15
    improvement in the optimal response.
  • BGP KeepAlive timer seems to be the dominant
    parameter.. in contrast to expectation of MRAI!
  • OSPF timers effect little, i.e. at most 5.
  • low time-scale OSPF updates do not effect BGP.

21
Meta-Simulation OSPF/BGP Interactions
Minimize total BOOB 15-25 better than other
metrics
  • Varied response surfaces -- equivalent to a
    particular management approach.
  • Importance of parameters differ for each metric.
  • For minimal total updates
  • Local perspectives are 20-25 worse than the
    global.
  • For minimal total interactions
  • 15-25 worse can happen with other metrics
  • OB updates are more important than BO updates
    (i.e. 0.1 vs. 50)

OB 50 of total updates BO 0.1 of total
updates
22
Meta-Simulation
  • Conclusions
  • Number of experiments were reduced by an order of
    magnitude in comparison to Full Factorial.
  • Experiment design and statistical analysis
    enabled rapid elimination of insignificant
    parameters.
  • Several qualitative statements and system
    characterizations could be obtained with few
    experiments.

23
OUTLINE
  • Problem Statement
  • Contributions
  • Meta-simulation
  • ROSS.Net
  • BGP4-OSPFv2 Investigation
  • Simulation
  • Kernel Processes
  • Seven Oclock Algorithm
  • Conclusion

24
Simulation Overview
  • Parallel Discrete Event Simulation
  • Logical Process (LPs) for each relatively
    parallelizable simulation model, e.g. a router, a
    TCP host
  • Local Causality Constraint Events within each LP
    must be processed
  • in time-stamp order
  • Observation Adherence to LCC is sufficient to
    ensure that parallel simulation will produce
    same result as sequential simulation
  • Conservative Simulation
  • Avoid violating the local causality constraint
    (wait until its safe)
  • Null Message (deadlock avoidance) (Chandy/Misra/By
    rant)
  • Time-stamp of next event
  • Optimistic Simulation
  • Allow violations of local causality to occur, but
    detect them and recover using a rollback
    mechanism
  • Time Warp Protocol (Jefferson, 1985)
  • Limiting amount of opt. execution

25
ROSS Rensselaers Optimistic Simulation System
ROSS
  • Example Accesses
  • GTW Top down hierarchy
  • lp_ptr GStateLPi.Map.lplistLPNumi
  • ROSS Bottom up hierarchy
  • lp_ptr event-gtsrc_lp
  • or
  • pe_ptr event-gtsrc_lp-gtpe
  • Key advantages of bottom up approach
  • reduces access overheads
  • improves locality and processor cache performance

tw_event
tw_pe
Memory usage only 1 more than sequential and
independent of LP count.
26
On the Fly Fossil Collection
OTFFC works by only allocating events from the
free list that are less than GVT. As events are
processed they are immediately placed at the end
of the free list....
Key Observation Rollbacks cause the free list to
become UNSORTED in virtual time. Result event
buffers that could be allocated are not. user
must over-allocate the free list
27
Contributions Simulation Kernel Process
Fossil Collection / Rollback
9
5
PE
9
(Processing Element per CPU utilized)
28
ROSS Kernel Processes
  • Advantages
  • significantly lowers fossil collection overheads
  • lowers memory usage by aggregation of LP
    statistics into KP statistics
  • retains ability to process events on an LP by LP
    basis in the forward computation.
  • Disadvantages
  • potential for false rollbacks
  • care must be taken when deciding on how to map
    LPs to KPs

29
ROSS KP Efficiency
Small trade-off longer rollbacks vs faster FC
Not enough work in system
30
ROSS KP Performance Impact
KPs does not negatively impact performance
31
ROSS Performance vs GTW
ROSS outperforms GTW 21 at best parallel
ROSS outperforms GTW 21 in sequential
32
Simulation Seven Oclock GVT
  • Optimistic approach
  • Relies on global virtual time (GVT) algorithm to
    perform fossil collection at regular intervals
  • Events with timestamp less than GVT
  • Will not be rolled back
  • Can be freed
  • GVT calculation
  • Synchronous algorithms LPs stop event processing
    during GVT calculation
  • Cost of synch. may be higher than positive work
    done per interval
  • Processes waste time waiting
  • Asynchronous algorithms LPs continue processing
    events while GVT calculation continues in the
    background
  • Goal creating a consistent cut among LPs that
    divides the events into past and future the
    wall-clock time

Two problems (i) Transient Message Problem, (ii)
Simultaneous Reporting Problem
33
Simulation Matterns GVT
  • Construct cut via message-passing

Cost O(log n) if tree, O(N) if ring
  • If large number of processors, then free pool
    exhausted waiting for GVT to complete

34
Simulation Fujimotos GVT
  • Construct cut using shared memory flag

Cost O(1)
Sequentially consistent memory model ensures
proper causal order
  • Limited to shared memory architecture

35
Simulation Memory Model
  • Sequentially consistent does not mean
    instantaneous
  • Memory events are only guaranteed to be causally
    ordered

Is there a method to achieve sequentially
consistent shared memory in a loosely
coordinated, distributed environment?
36
Simulation Seven Oclock GVT
  • Key observations
  • An operation can occur atomically within a
    network of processors if all processors observe
    that the event occurred at the same time.
  • CPU clock time scale (ns) is significantly
    smaller than network time-scale (ms).
  • Network Atomic Operations (NAOs)
  • an agreed upon frequency in wall-clock time at
    which some event logically observed to have
    happened across a distributed system.
  • subset of the possible operations provided by a
    complete sequentially consistent memory model.

Update Tables
Update Tables
Update Tables
Update Tables
Update Tables
Update Tables
Update Tables
wall-clock time
Compute GVT
Compute GVT
Compute GVT
Compute GVT
Compute GVT
Compute GVT
Compute GVT
wall-clock time
37
GVT
A
B
C
D
E
38
Simulation Seven Oclock GVT
  • ?tmax is not necessary when a message-passing
    system w/ acks is available.
  • Transient Message Problem
  • Since ?tmax, is known, senders account for
    messages sent in the time interval NAO- ?tmax,
    NAO.
  • Since no messages can take longer than ?tmax to
    transfer over the network, there cannot be any
    transient message.
  • Simultaneous Reporting Problem
  • Prevented since all processors see the cut at
    exact instant on wall-clock time.
  • In case, there is a clock synch error, any
    message sent in the error time period will be
    accounted for since the clock synch error is far
    less than ?tmax.

39
Simulation Seven Oclock GVT
  • Itanium-2 Cluster
  • r-PHOLD
  • 1,000,000 LPs
  • 10 remote events
  • 16 start events
  • 4 machines
  • 1-4 CPUs
  • 1.3 GHz
  • Round-robin LP to PE mapping

Linear Performance
40
Simulation Seven Oclock GVT
  • Netfinity Cluster
  • r-PHOLD
  • 1,000,000 LPs
  • 10, 25 remote events
  • 16 start events
  • 4 machines
  • 2 CPUs, 36 nodes
  • 800 GHz

41
Simulation Seven Oclock GVT TCP
  • Itanium-2 Cluster
  • 1,000,000 LPs
  • each modeling a TCP host (i.e. one end of a TCP
    connection).
  • 2 or 4 machines
  • 1-4 CPUs on each
  • 1.3 GHz
  • Poorly mapped LP/KP/PE

Linear Performance
42
Simulation Seven Oclock GVT TCP
  • Netfinity Cluster
  • 1,000,000 LPs
  • each modeling a TCP host (i.e. one end of a TCP
    connection).
  • 4-36 machines
  • 1-2 CPUs on each
  • Pentium III
  • 800MHz

43
Simulation Seven Oclock GVT TCP
  • Sith Itanium-2 cluster
  • 1,000,000 LPs
  • each modeling a TCP host (i.e. one end of a TCP
    connection).
  • 4-36 machines
  • 1-2 CPUs on each
  • 900MHz

44
Simulation Seven Oclock GVT
  • Summary
  • Seven OClock Algorithm
  • Clock-based algorithm for distributed processors
  • creates a sequentially consistent view of
    distributed memory
  • Zero-Cost Consistent Cut
  • Highly scalable and independent of event memory
    limits

45
Summary Contributions
  • Meta-simulation
  • ROSS.Net platform for large-scale network
    simulation, experiment design and analysis
  • OSPFv2 protocol performance analysis
  • BGP4/OSPFv2 protocol interactions
  • Simulation
  • kernel processes
  • memory efficient, large-scale simulation
  • Seven Oclock GVT Algorithm
  • zero-cost consistent cut
  • high performance distributed execution

46
Summary Future Work
  • Meta-simulation
  • ROSS.Net platform for large-scale network
  • incorporate more realistic measurement data,
    protocol models
  • CAIDA, Multi-cast, UDP, other TCP variants
  • more complex experiment designs ? better
    qualitative analysis
  • Simulation
  • Seven Oclock GVT Algorithm
  • compute FFT and analyze power of different
    models
  • attempt to eliminate GVT algorithm by
    determining max rollback length
Write a Comment
User Comments (0)
About PowerShow.com