A TCP Tuning Daemon - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

A TCP Tuning Daemon

Description:

U.S. Department of Energy. Oak Ridge National Laboratory. Roadmap ... WAD can divide buffer among concurrent flows fairer/faster? Tests inconclusive so far... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 15
Provided by: csmo
Category:

less

Transcript and Presenter's Notes

Title: A TCP Tuning Daemon


1
A TCP Tuning Daemon
  • SC2002
  • November 19, 2002

Tom Dunigan thd_at_ornl.gov Matt Mathis
mathis_at_psc.edu Brian Tierney bltierney_at_lbl.gov
2
Roadmap
  • Motivation
  • Net100 project
  • Web100
  • network probes sensors
  • protocol analysis
  • A TCP tuning daemon
  • Tuning experiments

www.net100.org
  • and now a word from our sponsors
  • DOE-funded project (Office of Science)
  • 1M/yr, 3 yrs beginning 9/01
  • LBL, ORNL, PSC, NCAR
  • Net100 project objectives (network-aware
    operating systems)
  • measure, understand, and improve end-to-end
    network/application performance
  • tune network protocols and applications (grid
    and bulk transfer)
  • first year emphasis TCP bulk transfer over high
    delay/bandwidth nets

3
Motivation
  • Poor network application performance
  • High bandwidth paths, but apps slow
  • Is it application? OS? network? Yes
  • Often need a network wizard
  • Changing bandwidths
  • 9.6 Kbs 1.5 Mbs ..45 1001000? Gbs
  • Unchanging TCP
  • speed of light (RTT)
  • MTU (still 1500 bytes)
  • TCP congestion avoidance
  • TCP is lossy by design !
  • 2x overshoot at startup, sawtooth
  • recovery after a loss can be very slow on todays
    high delay/bandwidth links
  • Recovery proportional to MSS/RTT2

8 Mbs
Linear recovery at 0.5 Mb/s!
Instantaneous bandwidth
Early startup losses
Average bandwidth
ORNL to NERSC ftp
GigE/OC12 80ms RTT
4
TCP tuning
  • set optimal (?) buffer size
  • need buffer bandwidthRTT
    ORNL/NERSC (80 ms, OC12) need 6 MB
  • avoid losses
  • modified slow-start
  • reduce bursts
  • anticipate loss (ECN,Vegas?)
  • reorder threshold
  • speed recovery
  • bigger MTU or virtual MSS
  • modified AIMD (0.5,1)
  • delayed ACKs and initial window
  • avoid congestion collapse
  • be fair (?) intranets, QoS


ns simulation 500 mbs link, 80 ms RTT Packet
loss early in slow start. Standard TCP with del
ACK takes 10 minutes to recover!
5
Net100 components for tuning
  • TCP protocol analysis
  • simulation/emulation
  • kernel tuning extensions
  • Web100 Linux kernel (NSF)
    www.web100.org
  • instrumented TCP stack (IETF MIB draft)
  • 100 variables per flow (/proc/web100)
  • socket open/close event notification
  • API and tools for tracing and tuning, e.g., bw
    tester http//firebird.ccs.ornl.gov7123
  • Path characterization
  • Network Tuning and Analysis Framework (NTAF)
  • both active and passive measurement
  • iperf, pipechar
  • schedule probes and distribute/archive results
  • data base of measurements
  • NTAF/Net100 hosts at PSC, NCAR,LBL,ORNL,
    NERSC,CERN,UT,SLAC
  • TCP tuning daemon

6
TCP Tuning Daemon
WAD config file bob src_addr 0.0.0.0
src_port 0 dst_addr 10.5.128.74
dst_port 0 mode 1 sndbuf 2000000
rcvbuf 100000 wadai 6 wadmd 0.3
maxssth 100 divide 1 reorder 9
sendstall 0 delack 0 floyd 1
  • Work-around Daemon (WAD)
  • tune unknowing sender/receiver at startup and/or
    during flow
  • Web100 kernel extensions
  • pre-set windowscale to allow dynamic tuning
  • uses netlink to alert daemon of socket open/close
    (or poll)
  • besides existing Web100 buffer tuning, new tuning
    options using WAD_ variables
  • knobs to disable Linux 2.4 caching, burst mgt.,
    and sendstall
  • config file with static tuning data
  • mode specifies dynamic tuning (Floyd AIMD, NTAF
    buffer size, concurrent
    streams)
  • daemon periodically polls NTAF for fresh tuning
    data
  • written in C (also python version)

7
Experimental results
  • Evaluating the tuning daemon in the wild
  • emphasis bulk transfers over high
    delay/bandwidth nets (Internet2, ESnet)
  • tests over 10GigE,OC48, OC12, OC3, ATM/VBR,
    GigE,FDDI,100/10T,cable, ISDN,wireless
    (802.11b),dialup
  • tests over NistNET 100T testbed
  • Various TCP tuning options
  • buffer tuning
  • AIMD mods (including Floyd, both in-kernel and in
    WAD)
  • slow-start mods
  • parallel vs single
  • Results are anecdotal
  • more systematic testing is on-going
  • Your mileage may vary .

Network professionals on a closed course.
Do not attempt this at home.
8
WAD tuning results
  • Classic buffer tuning
  • ORNL to PSC, OC12, 80ms RTT
  • network-challenged app. gets 10 Mbs
  • same app., WAD/NTAF tuned buffer gets 143 Mbs
  • Virtual MSS
  • tune TCPs additive increase (WAD_AI)
  • add k segments per RTT during recovery
  • k6 like GigE jumbo frame, but
  • interrupt rate not reduced
  • doesnt do k segments for initial window

9
Tuning around Linux (2.4) TCP
Amsterdam-Chicago GigE via 10GigE, 100 ms RTT
  • Tunable ssthresh caching
  • Tunable sendstall (TXQUELEN)

600 mbs
Floyd AIMD as cwnd grows increase AI and
decrease MD, do the reverse when cwnd
shrinks Added to Net100 kernel and to WAD (WAD
tunable)
Floyd AIMD
sendstalls
Standard AIMD
UDP event
10
WAD tuning
  • Modified slow-start and AI
  • ORNL to NERSC, OC12, 80 ms RTT
  • often losses in slow-start
  • WAD tuned Floyd slow-start and fixed AI (6)
  • WAD-tuned AIMD and slow-start
  • ORNL to CERN, OC12, 150ms RTT
  • parallel streams AIMD (1/(2k),k)
  • WAD-tuned single stream (0.125,4)

11
GridFTP tuning
Can tuned single stream compete with parallel
streams?
Mostly not with equivalence tuning, but
sometimes. Parallel streams have slow-start
advantage. WAD can divide buffer among
concurrent flowsfairer/faster? Tests
inconclusive so far. Testing on real
Internet is problematic.
Is there a congestion metric? Per unit of
time?
Flow Mbs congestion re-xmits untuned
28 4 30 tuned
74 5 295 parallel 52
30 401 untuned 25
7 25 tuned 67 2
420 parallel 88 17
440
Buffers 64K I/O, 4MB TCP (untuned 64K
TCP 8 mbs, 200s)
Data/plots from Web100 tracer
12
Future TCP tuning
  • Reorder threshold
  • seeing more out of order packets
  • WAD tune a bigger reorder threshold for path
  • 40x improvement!
  • Linux 2.4 does a good job already
  • adjusts and caches reorder threshold
  • undo congestion avoidance

LBL to ORNL (using our TCP-over-UDP)
dup3 case had 289 retransmits, but all were
unneeded!
  • Delayed ACKs
  • WAD could turn off delayed ACKs -- 2x
    improvement in recovery rate and slow-start
  • Linux 2.4 already turns off delayed ACKs for
    initial slow-start

ns simulation 500 mbs link, 80 ms RTT Packet
loss early in slow-start. Standard TCP with del
ACK takes 10 minutes to recover! NOTE aggressive
static AIMD (Floyd pre-tune)
13
Futures
  • Net100
  • analyze effectiveness/fairness of current tuning
    options
  • simulation
  • emulation
  • on the net (systematic tests)
  • NTAF probes -- characterizing a path to tune a
    flow
  • router data (passive)
  • monitoring applications with Web100
  • additional tuning algorithms
  • Vegas,ECN
  • non-TCP
  • identify non-congestive loss?
  • parallel/multipath selection/tuning
  • WAD-to-WAD tuning
  • jumbo frames experiments the quest for bigger
    and bigger MTUs
  • more user-friendly
  • Web100 extensions
  • refine user interface and API
  • port to other OSs

14
Summary
www.net100.org
  • Novel approaches
  • non-invasive dynamic tuning of legacy
    applications
  • using TCP to tune TCP (Web100)
  • tuning on a per flow/destination
  • Effective evaluation framework
  • protocol analysis and tuning net/app/OS
    debugging
  • out-of-kernel tuning
  • Beneficial interactions
  • TCP protocols (Floyd, Wu Feng (DRS), Web100,
    parallel/non-TCP)
  • Path characterization research (SciDAC, CAIDA,
    Pinger)
  • Scientific application and Data grids (SciDAC,
    CERN)
  • Performance improvements
Write a Comment
User Comments (0)
About PowerShow.com