Latency Limits to Communication drawn from Thekkath and Levy - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Latency Limits to Communication drawn from Thekkath and Levy

Description:

OS designers' dilemma. Processor are fast. Memories are fast. Networks are ... OS design has improved to shave off overheads and get close to hardware speeds ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 25
Provided by: compu62
Category:

less

Transcript and Presenter's Notes

Title: Latency Limits to Communication drawn from Thekkath and Levy


1
Latency Limits to Communication(drawn from
Thekkath and Levy)
2
OS designers dilemma
  • Processor are fast
  • Memories are fast
  • Networks are fast
  • Why is OS slow?
  • What are the bottlenecks?
  • Can we blame the hardware?
  • Should we blame ourselves?
  • Should we blame Canada?!! ?

3
Focus on communication latency
  • High speed networks
  • Time of Thekkath and Levy (1993)
  • 10 Mb Ethernet
  • 100 Mb FDDI
  • 140 MB ATM
  • Today (circa 2003 onwards)
  • Gigabit Ethernet the norm
  • Communication paradigm in distributed systems
  • RPC
  • What is the disparity between RPC performance and
    networks latency?

4
Takeaway
  • Latency more important than bandwidth
  • High bandwidth networks do not imply low latency
    communication
  • OS design has improved to shave off overheads and
    get close to hardware speeds
  • Need enhancements in hardware controllers to get
    close to network speeds

5
Overview of networks
  • Ethernet
  • Logically a bus protocol assumes physically a
    bus as well (CSMA/CD)
  • Fiber Distributed Data Interface (FDDI)
  • Logically a bus physically a ring (either fiber
    optic or copper) timed token ring protocol for
    guaranteed access to each station on the wire
  • Asynchronous Transfer Mode (ATM)
  • Mesh of switches form a packet switched network

6
Overview of controllers
  • Ethernet
  • Two possibilities
  • Controller has its own buffers and DMAs into/out
    of host memory (SparcStation)
  • Controllers buffers mapped into host memory for
    direct placement of send/receive packets (no DMA
    facility in the controller (DECstation)
  • FDDI
  • Send no DMA (controller buffer memory mapped to
    host memory
  • Receive DMA from network directly into host
    memory
  • ATM
  • FIFO each for transmission and reception mapped
    into host memory no DMA host reads/writes FIFOs
    to receive packets
  • Scatter/gather of message into ATM cells
    (packets) done by host software

7
Latency for basic message transfer
  • Components costs
  • Time on the wire?
  • Controller latency?
  • Control/data transfer?
  • Vectoring the interrupt?
  • Interrupt service?

8
Hardware level latency (usec)
9
How to design a low overhead RPC?
  • Fundamental sources of overhead
  • Marshaling and data copying
  • Control transfer
  • Protocol processing

10
Marshaling and data copying
  • Complete packet before handing to controller
  • Network and protocol headers (OS) plus user level
    message
  • Choices for assembly depends on controller
  • Scatter/gather DMA
  • Header in kernel
  • Packet from user space (but this can be expensive
    since this may have to be mapped into the kernel)
  • Typically three copies involved in DMA based
    controller without scatter/gather
  • First to prepare the message for transmission by
    the user
  • Second to copy the user buffer to kernel buffer
  • Third to move the message into the controller
    buffer

11
  • How to reduce number of copies?
  • Marshaling in the kernel
  • Synthesized procedure installed in the kernel for
    each client call
  • Directly marshal into kernel buffer
  • Fixed kernel entry point with shared descriptors
    between user and kernel
  • Kernel builds the transmission buffer using the
    descriptors

12
Control Transfer
  • How many context switches for RPC call/return?
  • Are all these in the critical path of round trip
    latency?
  • Client switch can be overlapped with message
    transmission
  • How can we cheat?
  • Do we need to switch the client?

13
Other sources of control transfer overhead
  • Protocol layering
  • Layer by layer queuing and thread switching bad
    idea
  • Dispatch from lowest layer (in the interrupt
    handler) to the destination process

14
Protocol processing
  • What should be the transport protocol underlying
    RPC?
  • TCP/IP?
  • UDP/IP?
  • Something else?
  • What can go wrong in transmission?
  • How do we decide what issues are important in the
    context of LAN?

15
  • Example choices to reduce the latency
  • No need for a reliable transport since
    call/return can serve as high level acks
  • Rely on hardware checksum (if available) and not
    do checksum in software for bit errors
  • Client side no need to do extra buffering for
    packet loss since client blocked
  • Server side buffering can be overlapped with
    transmission of the reply

16
Components of RPC latency
  • Client call
  • Controller latency
  • Time on wire for call
  • Interrupt handling on server
  • Server call receipt
  • Server reply
  • Client reply receipt

17
RPC performance
18
Takeaway from RPC measurements
  • Possible to shave software overheads to the point
    where hardware overhead matters!
  • Difference in software overheads for the
    different network gear
  • Code path variance
  • Details of setting up DMAs
  • Details of high level dispatch of server process
  • Key to improving RPC performance
  • Use preallocated buffers
  • Overlap kernel metaops (scheduling, buffering)
    with network transmission
  • Optimize for common case using controller
    features
  • Speed of the host processor

19
Implications for controller design
  • Data transfer
  • Transfer mode
  • Scatter/gather DMA, DMA, PIO
  • Implications on number of copies?
  • PIO always better?

20
  • Interaction with memory hierarchy
  • Cache/TLB flush if processor does not have
    support for memory coherence
  • Can be avoided by allocating buffers in uncached
    memory (not always possible)
  • Raw time in table not the whole story of
    performance losswhy?

21
  • Host/controller memory interface
  • Packet buffers in host memory with shared
    descriptors enabling DMA
  • Simple FIFO for transmit and receive using PIO
  • Which may be better?
  • Key is interrupt servicing cost
  • Small data size?
  • Large data size?

22
Suggestions for controller
  • Controller design that supports both
  • Descriptor based host interface
  • FIFO based host interface
  • Protection granularity an issue here
  • Address mapping irrespective of DMA/PIO
  • Make granularity of buffer allocation in
    controller memory independent of VM pagesize (see
    Figure 1)

23
(No Transcript)
24
Implications for networks
  • High bandwidth does not translate to low latency
  • Network protocol important
  • Lightly loaded FDDI high latency Vs, Ethernet
  • Fragmentation/reassembly of ATM worsens latency
  • Today Mostly Gigabit EthernetFDDI and ATM
    mostly phased out except for legacy
Write a Comment
User Comments (0)
About PowerShow.com