Title: Understanding Venus Performance Tentative Update 20031104
1Understanding Venus Performance(Tentative Update
2003-11-04)
- Shih-Hao Hung
- Performance and Availability Eng.
- Sun Microsystems Inc.Sun Confidential/Proprietar
y Internal Use Only
2Overview
- Venus is a PCI card that provides the following
functionalities for Solaris/SPARC platforms - High-performance gigabit Ethernet interface
- High-speed cryptographic engine
- SSL acceleration
- IPsec acceleration
- Our goals are to make sure
- Software (Apps, OS, and drivers) utilize Venus as
efficiently as possible. - Venus performs under mixed workload.
3Venus Gigabit Ethernet
- Venus uses the Cassini chip that is also used by
other Sun Gigabit Ethernet cards such as BGCC,
Kuheen, etc. - One major difference between Venus and other
Cassini-based cards Venus can interrupt only
ONE host processor, due to the limitation of the
Intel bridge chip on the Venus card.
4Venus Gigabit Ethernet TCP Performance
Measurement
- The peak throughput of Venus is on par with the
other Cassini-based cards (without MDT).
5New Gigabit Ethernet Features Proposed for Venus
1.1
- Hardware Checksuming Should help reduce CPU
consumption, but has a bug at this moment
(2003-06-05). - Jumbo Frames (JF) Data show that jumbo frames
improves IPsec acceleration by 3X with
SysKonnect cards. Support for jumbo frames may be
put in Venus 1.1. - Multi-Data Tranfer (MDT) MDT is already in
Solaris 9 Update 4 for Cassini, saving CPU cycles
and improving efficiency (i.e. Mbps/Mhz ratio) by
up to 60. Data show 5-10 performance gain on
SPECweb99 and Netperf with MDT-enabled Cassini
card and driver. - JF MDT may not be supported in 1.1.
6Venus Crypto Engine
- Venus has two Broadcom 5821 crypto chips. It is
possible for Venus hardware to offload the
following crypto operations - Public-key ops RSA (512-bit, 1024-bit, 2048-bit)
and DSA - Bulk encryption ops RC4, DES, 3DES
- Hash ops SHA1 and MD5
- RC4 support is disabled in the Venus 1.0
driver. - Software crypto is available for fail-over and
small tasks.
7Our Performance Data
- We have conducted performance testing on the
following platforms - 2-way 900mhz Sun Fire 280R
- 8-way 900mhz Sun Fire V880
- 12-way 1200mhz Sun Fire 6800
- The per-CPU numbers presented are based on the
900mhz UltraSPARC III cu processor.
8The Venus Crypto Engine
9Venus Crypto HardwarePerformance Measurement
10Venus Crypto SoftwarePerformance Measurement
11The Venus SSL Performance
12Venus SSL Support
(Tentative, check when spec. is final)
13Venus SSL Performance
HW bulk encryption support is disabled by
default for S1WS for Venus 1.0.
14Venus SSLAdditional Performance Issues
- Enabling HW bulk encryption support cause extra
overhead for key management operation - SSL handshake performance is reduced by 33
(BugID 4814633) - Short-term fix disable HW bulk encryption by
default offer a mechanism for users to enable
the support. - RFE 4753295 Should find a way to reduce the key
management overhead. - Update (2003-06-06) The gap has been shrunk to
14 with latest Venus 1.1 software. - Enabling HW bulk encryption support may limit the
SSL throughput - Affect mostly large systems
- Customer may choose to disable the support, or
buy additional cards.
15The Venus IPsec Performance
16Solaris IPsecPerformance Issues with 3DES
- The Stock Solaris 9 (update 3) IPsec-3DES is slow
and does not scale. - 3DES code is not optimized.
- 3DES jobs are done synchronously.
- Packets are processed sequentially.
- 28 mbps on a 2-way 900mhz E280R, only one CPU is
utilized.
17Venus IPsecDesign Considerations
- Accelerates DES/3DES encryption/decryption via
- Asynchronous processing by KCL2 job scheduler,
- Performance-optimized software crypto,
- Hardware offloading engine,
- Must process jobs at Ethernet packet size, 1460
bytes, which is much smaller than the SSL chunk
size. - A big constraint for hardware offloading, a big
issue of IPsec acceleration compared to SSL
acceleration. - Impacted by hardware offloading overhead
- Packets lt 512 bytes are not offloaded overhead
too costly - Light weight ciphers such MD5 and SHA1 are harder
to benefit from hardware offloading.
18Venus IPsec Implementations
- Venus accelerates IPsec in one of the following
two forms - Out-of-band
- Packets are sent to Venus crypto for encryption,
and then sent to any NIC for transmission. - Packets are received from any NIC, and then sent
to Venus crypto for decryption. - In-band (pending Solaris 9 Update 5)
- Packets are sent to Venus crypto for encryption
and transmitted via Venus NIC in one trip. - Packets are received by Venus NIC and decrypted
by Venus crypto before entering the host. - The in-band implementation will really reflect
the strength of Venus, but it requires
significant changes to the network stack.
19Venus Out-of-Band IPsec
ipsecesp
- Venus out-of-band IPsec requires minor changes to
an existing system - New modules replacing encrdes/encr3des modules.
- For pkt lt 512 bytes, swcrypto handles 3des
- For pkt gt 512 bytes, KCL handles 3des
- KCL sends jobs to vca for hardware offload when a
Venus card is available - KCL sends jobs to its software crypto when
hardware offloading is not available.
Venus
encr3des
pkt lt512
pkt gt512
swcrypto
KCL
no hardware
hardware ok
vca
Software3des
Hardware3des
20Venus IPsecPerformance Benefits
- Accelerates IPsec-3DES throughput
- To 105 mbps on a 1-way 900mhz E280R.
- 375 speedup compared to stock S9u3.
- Improves throughput scalability
- Asynchronous crypto processing scales throughput
to 210 mbps on 8-way 900mhz V880. - 750 speedup compared to stock S9u3.
- Reduces IPsec latency.
- Asynchronous crypto processing improves
parallelism and hence reduces the latency in 3DES
encryption/decryption.
21Venus IPsecTCP Unidirectional RX Throughput
- Per CPU numbers measured on 900mhz E280R.
- Per Card number measured on 12-way 1.2Ghz SF6800.
- The TX or bi-directional throughput is similar to
RX, but is 15-20 slower. - The on-going FireEngine project may be able to
address this issue by making IP MT-hot.
22IPsec Latency
- IPsec adds substantial latency, and thus affects
mostly - Applications that demands low network latency.
- The transaction rate for single-threaded
applications. - Venus reduces IPsec latency via fast and
asynchronous crypto processing, - The graph shows latency reduction by Venus
software and hardware. - Tuning can be applied thru Encr3DesTuning and
unloading the vca module to minimize latency for
specific apps.
Note Encr3DesTuning is set to 256 in this set of
data. Default is 512.
23Jumbo Frames and Venus IPsec Acceleration
- Venus IPsec acceleration is sensitive to packet
size. - Significant overhead for regular Ethernet packets
(MTU1500). - Overhead reduced for bigger MTU (Jumbo Frames).
- Performance data measured with SysKonnect 9821
Ethernet card and Venus out-of-band IPsec
acceleration show 3X performance.
24The Venus Performanceunder Mixed Workload
25Venus PerformanceUnder Mixed Workload
- Possible scenarios
- Mixed non-IPsec and IPsec traffics
- Mixed non-IPsec and SSL traffics
- Would NIC operations interfere with crypto
operations? - Yes, because both the NIC and the crypto chips
share one interrupt line. - NIC can generate interrupts much more rapidly
than the crypto chips typically do. - BugID 4799279
26Venus PerformanceUnder Mixed Workload (cont.)
- Crypto performance suffers when network traffic
is high. - 30 to 90 3DES performance degradation (hurts
IPsec) - 50 to 80 RSA performance degradation (hurts
SSL) - Ideal (long-term) fix would be to have separate
interrupt lines for crypto and NIC. - Workaround is available
- Use rx-intr-pkts and rx-intr-time to limit the
interrupt rate from the NIC. - However, it reduces NIC performance up to 30.
- Still Working on bug fixes in 1.1 (2003-06-09).
27Summary
28(No Transcript)
29Extra Materialsfor Technical Discussions
30IPsec TCP_RR Latency
31Netperf TCP_RR Latency