Title: Scalability and Accuracy in a LargeScale Network Emulator
1Scalability and Accuracy in a Large-Scale Network
Emulator
Amin Vahdat, Ken Yocum, Kevin Walsh, Priya
Mahadevan, Dejan Kostic, Jeff Chase, and David
Becker Duke University Proceedings of 5th
Symposium on Operating Systems Design and
Implementation (OSDI 2002)
CPSC 538A - Paper Presentation Roman
Holenstein February 23, 2005
2Introduction
- Evaluate Internet-scale distributed systems
- E.g. peer-to-peer, overlay, wide-area
replication - Realistic scenarios real world
- Difficult to deploy and administer
- Results not reproducible or not necessarily
representative of future behaviour - Simulations e.g. NS
- More control
- May miss important system interactions
- Emulation
- Run unmodified code on target platforms
- More control can subject system traffic to
constraints (bandwidth, latency, loss rate,
topology,) - Thus far limited to small and static systems
- ? ModelNet
3Goal of ModelNet
- Environment should support
- Unmodified applications
- Reproducible results
- Experimentation under broad range of network
topologies and dynamically changing network
characteristics - Large-scale experiments with large number of
nodes and high traffic
4ModelNet Architecture
- Scalable Internetemulation environment
- Based on dummynet, extended to improve accuracy
and include multi-hop and multi-core emulation - Edge nodes running user-specified OS and
applications - Each instance is a virtual edge node (VN) with
unique IP in emulated topology - Route traffic through core routers
- Core nodes emulate behaviour of configured
target network - Captures effects of congestion and cross-traffic
- Uses emulated links or pipes
5ModelNet Phases
- CREATE
- Generate network topology ? GML graph()
- Can use Internet traces, BGP dumps, synthetic
topology generators - User can annotate graph to specify packet loss
rates, failure distribution, etc.
() GML graph modeling language
6ModelNet Phases
- DISTILL
- Transform GML graph to pipe topology to model
target network - Simplify network
- Trade accuracy for reduced emulation cost
7ModelNet Phases
- ASSIGN
- Map distilled topology to core nodes, load
balancing - Ideal assignment NP-complete problem
- Mapping pipes to cores depends on routing, link
properties and traffic load - Use simple greedy k-clusters assignment
- Randomly pick one node in the topology for each
core node, then cores greedily select from
connected nodes in round-robin
8ModelNet Phases
- BIND
- Assign VNs to edge nodes
- Can have multiple VNs per physical edge node
- Bind each physical node to a single core
- Install sets of pipes in distilled topology and
routing tables with shortest-path between VN
pairs - Configure edge nodes with IP addresses for each VN
9ModelNet Phases
- RUN
- Execute target applications on edge nodes
10The Core
- Principal tasks (in steady state)
- Receive packets from network interface
- Move packets
- Pipe to pipe
- Pipe to final destination
- Moving packets is strictly higher priority than
receiving packets - Preferentially emulate packets already in core
- ? core CPU saturation results in dropped packets
at physical level rather than emulation
11The Core
- Traffic routing
- Emulate links as pipes
- Pre-computed shortest-path for all VN pairs?
requires O(n2) space - Route is ordered list of pipes
- Move packets through pipes by reference (packet
descriptor)
12The Core
- Packet scheduling
- Heap of pipes sorted by earliest deadline (exit
time for first packet in queue) - Scheduler executes once per clock tick (10KHz),
runs at kernels highest priority - Finds heaps with deadline later than current time
- Move packets to next destination (tail of next
pipe or VN) - Calculate new deadlines and reinsert pipes into
heap
13The Core
- Multi-core configuration
- Next pipe may be on different core node
- Transfer packet descriptor to next node
- Packet contents buffered at entry core node and
forwarded to destination upon delivery of packet
14Scalability Issues
- Bandwidth limitation
- Traffic through ModelNet core is limited to
clusters physical internal bandwidth - Memory requirement
- ModelNet must buffer up to full bandwidth-delay
product of target network - Routing protocol
- Assumes perfect routing protocol shortest path
between all pairs of host - Instantaneous discovery of new shortest path upon
node or link failure
15Setup for Experiments
- Core routers
- 1.4 GHz Pentium-IIIs w/ 1 GB memory
- FreeBSD-4.5-STABLE
- Connected via 1GB switch
- Edge nodes
- 1 GHz Pentium-IIIs w/ 256 MB memory
- Linux 2.4.17
- Connected via 100Mb/s Ethernet
16Baseline Accuracy
- Accurately emulate target packet characteristics
on hop-by-hop basis - Use kernel logging to track performance and
accuracy - Run ModelNet scheduler at highest kernel priority
- Results
- Each hop accurately emulated to granularity of
hardware timer (100µs) - Maintains accuracy up to 100 CPU utilization
- Future improvement
- in subsequent hops use packet dept handling to
correct for emulation errors
17Capacity
- Quantify as function of load and of hops
- Single core
- 1 Gb/s link
- 1-5 edge nodes
- Each with up to 24 netperf senders (24 VNs) and
24 receivers - 1 Gb/s Ethernet connection
- For 1 hop
- At 120 flows CPU is 50 used
- Network link is bottleneck
- gt4 hops
- CPU is bottleneck
18Additional Cores
- Deliver higher throughput
- increasing probability of packets path crossing
node boundary? cross-core traffic - Introduces communication overhead
- Ability to scale depends on
- Application communication characteristics
- Partitioning of topology (minimize cross-core
traffic)
19VN Multiplexing
- Mapping of VNs to physical edge nodes
- Enables larger-scale emulations
- Affects emulation accuracy and scalability
- Context switch overhead
- Scheduling behaviour
- Resource contention at edge nodes
20Tradeoff Accuracy vs. Scalability
- Impractical to model every packet and link for
large portion of Internet - Create controlled Internet-like execution context
for applications - Reduce overhead by making approximations that
minimally impact application behaviour - Ideally automate tradeoff to satisfy resource
conditions and report degree of inaccuracy to
user
21Distillation
- Hop-by-hop emulation
- Distilled topology isomorphic to target network
- Accurate but highest per packet cost
- End-to-end emulation
- Collapse each path to single pipe ? full mesh
- Lowest overhead
- Can capture raw network latency, bandwidth and
loss rate - Cannot emulate link contention among competing
flows
22Distillation
- Walk-in
- Preserve first walk-in links, replace interior by
full mesh - Breadth-first traversal to find successive
frontier sets (first frontier set is set of all
VNs) - Each packet traverses at most (2walk-in)1 pipes
- Cannot model contention in interior
- Walk-out
- Model under-provisioned core
- Extend walk-in algorithm to preserve inner core
- Find topological center by generating
successive frontiers until one of size one or
zero is found - Collapse paths between walk-in and walk-out
23Distillation
- Ring topology
- 20 routers
- Interconnected at 20 Mb/s
- 20 VNs connected to each router by 2 Mb/s links
- VNs partitioned into generator and receiver sets
- Each generator sends to random receiver
- Hop by hop 419 pipes
- End to end 79,800 pipes
- Last-mile only 400 edge links and 190 interior
links
24Changing Network Characteristics
- Evaluation of adaptive Internet systems
- User can
- directly incorporate generators for competing
traffic - accurate for emulation of background cross
traffic - consumes resources at edge nodes and bandwidth at
core - modify pipe parameters during emulation
- inject cross traffic by dynamically low overhead,
scales independently of traffic rate - does not capture all details of Internet packet
dynamics (e.g. slow start, bursty traffic) - not responsive to congestion
- ? emulation error grows with link utilization
level - Fault injection
25Case Studies
- Network of Gnutella clients
- 10,000 nodes (100 VNs for each of the 100 edge
nodes) - Support for emulation of ad hoc wireless
environments - Implemented but not presented in this paper
- CFS(1)
- Able to reproduce results from CFS implementation
running on RON(2) testbed (published by another
group) - Replicated web services
- Replay of trace to IBMs main website
- Able to show that one additional replica improves
latency, third replica only marginally beneficial - Ability to emulate contention on interior links
crucial for obtaining these results - Adaptive overlays
- ACDC overlay that adapts to changing network
conditions - Similar experiment results obtained by ModelNet
and ns2
(1) CFS - Cooperative File System (2) RON -
Resilient Overlay Network (MIT)
26Related Work
- Many other efforts on emulation
- Mostly focus on specific, static and small-scale
systems - Netbed (Emulab)
- Similar to ModelNet, except that ModelNet focuses
on scalable emulation of large-scale networks - Will integrate ModelNet efforts into Netbed
- Competing research by WASP(1) project
- Emulate network characteristics at end host
- Requires emulation software on all edge nodes
- Cannot capture congestion of multiple flows on
single pipe
(1) WASP Wide Area Server Performance
(J.-Y. Pan, H. Bhanoo, E. Nahum, M. Rosu, C,
Faloutsos, and S. Seshan)
27Summary
- ModelNet designed to support
- Unmodified applications
- Reproducible results
- Broad range of network topologies and dynamically
changing characteristics - Large-scale experiments
- Provided means of balancing accuracy and cost
- Presented case studies to show generality of
approach