Understanding Venus Performance Tentative Update 20031104 - PowerPoint PPT Presentation

About This Presentation
Title:

Understanding Venus Performance Tentative Update 20031104

Description:

Accelerates DES/3DES encryption/decryption via: Asynchronous processing by KCL2 job ... Accelerates IPsec-3DES throughput. To 105 mbps on a 1-way 900mhz E280R. ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 32
Provided by: Kev8114
Category:

less

Transcript and Presenter's Notes

Title: Understanding Venus Performance Tentative Update 20031104


1
Understanding Venus Performance(Tentative Update
2003-11-04)
  • Shih-Hao Hung
  • Performance and Availability Eng.
  • Sun Microsystems Inc.Sun Confidential/Proprietar
    y Internal Use Only

2
Overview
  • Venus is a PCI card that provides the following
    functionalities for Solaris/SPARC platforms
  • High-performance gigabit Ethernet interface
  • High-speed cryptographic engine
  • SSL acceleration
  • IPsec acceleration
  • Our goals are to make sure
  • Software (Apps, OS, and drivers) utilize Venus as
    efficiently as possible.
  • Venus performs under mixed workload.

3
Venus Gigabit Ethernet
  • Venus uses the Cassini chip that is also used by
    other Sun Gigabit Ethernet cards such as BGCC,
    Kuheen, etc.
  • One major difference between Venus and other
    Cassini-based cards Venus can interrupt only
    ONE host processor, due to the limitation of the
    Intel bridge chip on the Venus card.

4
Venus Gigabit Ethernet TCP Performance
Measurement
  • The peak throughput of Venus is on par with the
    other Cassini-based cards (without MDT).

5
New Gigabit Ethernet Features Proposed for Venus
1.1
  • Hardware Checksuming Should help reduce CPU
    consumption, but has a bug at this moment
    (2003-06-05).
  • Jumbo Frames (JF) Data show that jumbo frames
    improves IPsec acceleration by 3X with
    SysKonnect cards. Support for jumbo frames may be
    put in Venus 1.1.
  • Multi-Data Tranfer (MDT) MDT is already in
    Solaris 9 Update 4 for Cassini, saving CPU cycles
    and improving efficiency (i.e. Mbps/Mhz ratio) by
    up to 60. Data show 5-10 performance gain on
    SPECweb99 and Netperf with MDT-enabled Cassini
    card and driver.
  • JF MDT may not be supported in 1.1.

6
Venus Crypto Engine
  • Venus has two Broadcom 5821 crypto chips. It is
    possible for Venus hardware to offload the
    following crypto operations
  • Public-key ops RSA (512-bit, 1024-bit, 2048-bit)
    and DSA
  • Bulk encryption ops RC4, DES, 3DES
  • Hash ops SHA1 and MD5
  • RC4 support is disabled in the Venus 1.0
    driver.
  • Software crypto is available for fail-over and
    small tasks.

7
Our Performance Data
  • We have conducted performance testing on the
    following platforms
  • 2-way 900mhz Sun Fire 280R
  • 8-way 900mhz Sun Fire V880
  • 12-way 1200mhz Sun Fire 6800
  • The per-CPU numbers presented are based on the
    900mhz UltraSPARC III cu processor.

8
The Venus Crypto Engine
9
Venus Crypto HardwarePerformance Measurement
10
Venus Crypto SoftwarePerformance Measurement
11
The Venus SSL Performance
12
Venus SSL Support
(Tentative, check when spec. is final)
13
Venus SSL Performance
HW bulk encryption support is disabled by
default for S1WS for Venus 1.0.
14
Venus SSLAdditional Performance Issues
  • Enabling HW bulk encryption support cause extra
    overhead for key management operation
  • SSL handshake performance is reduced by 33
    (BugID 4814633)
  • Short-term fix disable HW bulk encryption by
    default offer a mechanism for users to enable
    the support.
  • RFE 4753295 Should find a way to reduce the key
    management overhead.
  • Update (2003-06-06) The gap has been shrunk to
    14 with latest Venus 1.1 software.
  • Enabling HW bulk encryption support may limit the
    SSL throughput
  • Affect mostly large systems
  • Customer may choose to disable the support, or
    buy additional cards.

15
The Venus IPsec Performance
16
Solaris IPsecPerformance Issues with 3DES
  • The Stock Solaris 9 (update 3) IPsec-3DES is slow
    and does not scale.
  • 3DES code is not optimized.
  • 3DES jobs are done synchronously.
  • Packets are processed sequentially.
  • 28 mbps on a 2-way 900mhz E280R, only one CPU is
    utilized.

17
Venus IPsecDesign Considerations
  • Accelerates DES/3DES encryption/decryption via
  • Asynchronous processing by KCL2 job scheduler,
  • Performance-optimized software crypto,
  • Hardware offloading engine,
  • Must process jobs at Ethernet packet size, 1460
    bytes, which is much smaller than the SSL chunk
    size.
  • A big constraint for hardware offloading, a big
    issue of IPsec acceleration compared to SSL
    acceleration.
  • Impacted by hardware offloading overhead
  • Packets lt 512 bytes are not offloaded overhead
    too costly
  • Light weight ciphers such MD5 and SHA1 are harder
    to benefit from hardware offloading.

18
Venus IPsec Implementations
  • Venus accelerates IPsec in one of the following
    two forms
  • Out-of-band
  • Packets are sent to Venus crypto for encryption,
    and then sent to any NIC for transmission.
  • Packets are received from any NIC, and then sent
    to Venus crypto for decryption.
  • In-band (pending Solaris 9 Update 5)
  • Packets are sent to Venus crypto for encryption
    and transmitted via Venus NIC in one trip.
  • Packets are received by Venus NIC and decrypted
    by Venus crypto before entering the host.
  • The in-band implementation will really reflect
    the strength of Venus, but it requires
    significant changes to the network stack.

19
Venus Out-of-Band IPsec
ipsecesp
  • Venus out-of-band IPsec requires minor changes to
    an existing system
  • New modules replacing encrdes/encr3des modules.
  • For pkt lt 512 bytes, swcrypto handles 3des
  • For pkt gt 512 bytes, KCL handles 3des
  • KCL sends jobs to vca for hardware offload when a
    Venus card is available
  • KCL sends jobs to its software crypto when
    hardware offloading is not available.

Venus
encr3des
pkt lt512
pkt gt512
swcrypto
KCL
no hardware
hardware ok
vca
Software3des
Hardware3des
20
Venus IPsecPerformance Benefits
  • Accelerates IPsec-3DES throughput
  • To 105 mbps on a 1-way 900mhz E280R.
  • 375 speedup compared to stock S9u3.
  • Improves throughput scalability
  • Asynchronous crypto processing scales throughput
    to 210 mbps on 8-way 900mhz V880.
  • 750 speedup compared to stock S9u3.
  • Reduces IPsec latency.
  • Asynchronous crypto processing improves
    parallelism and hence reduces the latency in 3DES
    encryption/decryption.

21
Venus IPsecTCP Unidirectional RX Throughput
  • Per CPU numbers measured on 900mhz E280R.
  • Per Card number measured on 12-way 1.2Ghz SF6800.
  • The TX or bi-directional throughput is similar to
    RX, but is 15-20 slower.
  • The on-going FireEngine project may be able to
    address this issue by making IP MT-hot.

22
IPsec Latency
  • IPsec adds substantial latency, and thus affects
    mostly
  • Applications that demands low network latency.
  • The transaction rate for single-threaded
    applications.
  • Venus reduces IPsec latency via fast and
    asynchronous crypto processing,
  • The graph shows latency reduction by Venus
    software and hardware.
  • Tuning can be applied thru Encr3DesTuning and
    unloading the vca module to minimize latency for
    specific apps.

Note Encr3DesTuning is set to 256 in this set of
data. Default is 512.
23
Jumbo Frames and Venus IPsec Acceleration
  • Venus IPsec acceleration is sensitive to packet
    size.
  • Significant overhead for regular Ethernet packets
    (MTU1500).
  • Overhead reduced for bigger MTU (Jumbo Frames).
  • Performance data measured with SysKonnect 9821
    Ethernet card and Venus out-of-band IPsec
    acceleration show 3X performance.

24
The Venus Performanceunder Mixed Workload
25
Venus PerformanceUnder Mixed Workload
  • Possible scenarios
  • Mixed non-IPsec and IPsec traffics
  • Mixed non-IPsec and SSL traffics
  • Would NIC operations interfere with crypto
    operations?
  • Yes, because both the NIC and the crypto chips
    share one interrupt line.
  • NIC can generate interrupts much more rapidly
    than the crypto chips typically do.
  • BugID 4799279

26
Venus PerformanceUnder Mixed Workload (cont.)
  • Crypto performance suffers when network traffic
    is high.
  • 30 to 90 3DES performance degradation (hurts
    IPsec)
  • 50 to 80 RSA performance degradation (hurts
    SSL)
  • Ideal (long-term) fix would be to have separate
    interrupt lines for crypto and NIC.
  • Workaround is available
  • Use rx-intr-pkts and rx-intr-time to limit the
    interrupt rate from the NIC.
  • However, it reduces NIC performance up to 30.
  • Still Working on bug fixes in 1.1 (2003-06-09).

27
Summary
28
(No Transcript)
29
Extra Materialsfor Technical Discussions
30
IPsec TCP_RR Latency
31
Netperf TCP_RR Latency
Write a Comment
User Comments (0)
About PowerShow.com