Lessons Learned from Real Life - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Lessons Learned from Real Life

Description:

Led software development effort on a small team developing ... Window advert. from receiver. data 1460. Another MSS. data 1176. Smaller segment from sender ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 30
Provided by: jere123
Category:
Tags: advert | learned | lessons | life | real

less

Transcript and Presenter's Notes

Title: Lessons Learned from Real Life


1
Lessons Learned from Real Life
  • Jeremy Elson
  • National Institutes of Health
  • November 11, 1998

USC!
2
quick bio
  • Hi, Im Jeremy. Nice to meet you.
  • 1996 BS Johns Hopkins, Comp Sci
  • Sep 96 - Sep 98 Worked at NIH full-time
  • Led software development effort on a small team
    developing an ATM-based telemedicine system
    called the Radiology Consultation WorkStation
    (RCWS)
  • Sep 98 Decided to return to school full-time
  • Nov 98 Gave a talk to dgroup about interesting
    lessons learned during development of the RCWS

3
my talk
  • Very quick description of the RCWS
  • In future dgroups, I can give a talk about the
    RCWS, or about ATM, if there is interest
  • Some pitfalls and fallacies in networking I
    discovered while developing the RCWS
  • Techniques for network problem solving

4
Radiology Consultation Workstation Network
5
RCWS Block Diagram
6
an unintended test
  • Initial Configuration 2 Sparc 20s w/50MHz CPUs
    Solaris 2.5.1 Efficient Networks ATM NICs
    _at_155MHz, LattisCell 10114-SM
  • TTCP memory-to-memory 60 Mbps
  • Upgrade to 75MHz chips, otherwise identical
  • TTCP now reports 90Mbps!
  • 50 upgrade in CPU speed led to exactly 50
    increase in network throughput

7
pitfall infinite CPU
  • In many systems, the network is the bottleneck
    we have infinite CPU in comparison. We try to
    use CPU to save network bandwidth
  • Compression
  • Multicast
  • Caching (sort of)
  • Micronet design
  • Pitfall Assuming this is always true. In our
    ATM app, compression might slow it down!

8
a surprising outcome
  • There are various ways of doing IP over ATM
  • Classical IP MTU 9K
  • LANE MTU 1500 bytes (for Ethernet bridging)
  • Which would you expect would have better bulk TCP
    performance, and by how much?
  • Classical IP did better -- by a factor of 5! I
    didnt believe it at first.
  • Turned out that both were sending roughly the
    same packets/sec CLIP more bytes/packet

9
pitfall networks run out of bandwidth first
  • The number of bytes per second is only one
    metric consider packets per second also. This
    is sometimes the wall you hit first.
  • Fixed packet processing cost appears to far
    outweigh the incremental cost to transmit more
    bytes as part of the same packet
  • This fits nicely with the previous observation
    CPU is only fast enough for n packets/sec
  • This is old news to Cisco, backbone ISPs, etc.

10
pathological networks
  • We built an on-campus ATM network and bought
    access to a MAN (ATDnet), but the only WAN
    available was the ACTS satellite
  • Our network was very long and very fat OC3 (155
    Mb/sec) over satellite (500ms RTT).
  • We were expecting standard LFN-related problems
    the solutions are fairly well-known (window
    scaling, PAWS, SACK, etc.)
  • What surprised me was something else interactive
    performance!

11
To perform actions such as screen updates,
requests must go through a server. Therefore the
user response time will be RTT.
Request
Reply
1/8 of a second from Earth to a
geostationary satellite RTT 1/2 second (plus
ground switching delay queuing delay)
Earth
12
the best laid plans
  • Requests are small messages (lt100 bytes)
    transmitted using TCP over ATM
  • Everything seemed to work fine on-campus
  • Over the satellite, we were expecting to see
    delays of 1/2 sec in command execution
  • Instead we saw gt1 second delays much more than
    we were expecting hard to use. Uh oh.
  • My job (with 2 hours of satellite time ticking
    away) figure out why this was happening

13
the answer tcpdump
  • tcpdump is a packet-sniffer written by Steve
    McCanne, Craig Leres, and Van Jacobson at LBL
  • Monitors a LAN in realtime prints info about
    each packet (source/dest, sequence numbers,
    flags, acknowledgements, options)
  • Runs on most UNIX variants
  • The most spectacularly fantastically wonderful
    network debugging tool on planet Earth my
    knee-jerk reaction whenever there is any problem
    is to fire this up first

14
tick, tock, tick, tock...
At the application layer, messages are 70 bytes
long.
Client
Server
15
the nagle finagle
  • Each application-layer message is split into 2
    segments. Why?
  • Because the app was calling write() twice
  • For some reason, the second half isnt sent until
    the first half is ACKed! Why?
  • The Nagle Algorithm, which says dont send a
    tinygram if there is an outstanding tinygram.
  • Users had to wait 3 RTTs instead of 1
  • Short term fix turn off the Nagle Algorithm
    (setsockopt TCP_NODELAY in Solaris)
  • Long term fix rewrite the message-passing
    library to use writev() instead of write().

16
pitfall dont care how TCP and app get along
  • Its easy to think of TCP as a generic way of
    getting things from Here to There sometimes, if
    we look deeper, we find problems
  • Good example HTTP interactions with TCP study by
    Touch, Heidemann Obraczka
  • Of course, different TCP implementations react
    differently. (Maybe some TCPs wait before
    launching and would have hidden this.)

17
the big mystery
  • Remember 90 Mbps Sparc 20 to Sparc 20
  • Scenario Two machines doing FTP (to /dev/null)
  • Machine A Sun Ultra-1 running Solaris 2.5.1, 155
    Mbps fiber ATM NIC
  • Machine B Fast Pentium-II running Windows NT
    4.0, 25 Mbps UTP ATM NIC
  • Using LANE, 1500 byte MTU
  • Transmitting from A to B 23 Mbps
  • Transmitting from B to A 8 Mbps!! Why?

18
tcpdump to the rescue
A
B
more segments (not shown)
long quiet time - no activity
19
observations aboutour mystery
  • Sending A to B (the 22Mbps case), machine
    generated only MSS segments B to A did not.
    (Could account for some slowdown.)
  • The ACKs from A all came at very regular
    intervals (50ms)
  • Data came quickly (say, all in about 20ms)
    followed by long quiet time (say, 30ms)
  • Whats going on????

20
deferred ACKs
  • When we receive data, we wait a certain interval
    before sending an ACK
  • This attempts to reduce traffic generated by
    interactive (keystroke) activity by hoping a new
    window and/or data will be ready, too
  • We dont want to do this with bulk data (defined
    as 3 MSSs in a row)

21
keystrokes the worst case
Assume both sides are initially advertising Win
100
User
Server
Time
22
keystrokes what we want
Assume both sides are initially advertising Win
100
User
Server
Time
23
another look at the trace
A
B
more segments (not shown)
long quiet time - no activity
24
the mystery unmasked
  • Only observable because all of the following were
    true (take out 1, the problem vanishes)
  • Receiver using deferred ACKs
  • Sender not sending all MSS sized data
  • Bandwidth high enough and window small enough so
    that the window can be filled before the deferred
    ACK interval expires (rare at 10mbps)
  • When I turned off the deferred ACKs on the
    receiver, bandwidth jumped to 23 Mbps. (Under
    Solaris this can be done with ndd)

25
tcpdump our best friend
  • Virtually impossible to figure out problems like
    the previous one by just puzzling it out
  • Reading about how protocols work is a good
    starting point implementing them gives you even
    more. But
  • Nothing gave me more intimate knowledge of TCP
    than seeing it come alive. Not looking at high
    level behavior, but actually watching packets fly
    across the wire
  • Different stacks have different personalities
  • TCP/IP Illustrated v1 is great to learn how

26
other uses of tcpdump
  • Keeping my ISDN router from dialing
  • Widespread teardrop attack on NIH (I patched
    tcpdump to make this easier)
  • Netscape SYN bug
  • Samba hitting DNS
  • Inoculan directed broadcasts
  • Diagnosing dead and/or segmented networks
  • Even rough performance measurement
  • The network people thought I was a magician!

27
summarylessons learned
  • I. Thou shalt not assume that thy CPU is
    infinite in power, for thy network may indeed
    be more plentiful.
  • II. Thou shalt take mind of the number
    of packets thou sendeth to thy network
    for, yea, a multitude thereof may wreak
    havoc thereupon.

28
summarylessons learned
  • III. Thou shalt read the Word of Stevens in
    TCP/IP Illustrated, and become learned in the
    ways of tcpdump, so that thy days of network
    debugging shall be pleasant and brief.
  • IV. Thou shalt watch carefully the packets
    that thy applications create, so that TCP may be
    thy servant and not thy taskmaster.

29
thats all, folks!
Write a Comment
User Comments (0)
About PowerShow.com