Title: Implementing Convergent Networking: Partner Concepts
1Implementing Convergent Networking Partner
Concepts
- Uri Elzur
- Broadcom Corporation
- Director, Advanced Technology
Brian Hausauer Neteffect Inc.Chief Architect
2Convergence In The Data CenterConvergence Over
IP
- Uri Elzur
- Broadcom Corporation
- Director, Advanced Technology
3Agenda
- Application requirements in Data Center
- Data flows and Server architecture
- Convergence
- Demo
- Hardware and software challenges and advantages
- Summary
4Enterprise Network TodayIT Get ready for
tomorrows Data Center, today
- Multiple networks drive Total Cost of Ownership
(TCO) up - Consolidation, convergence, virtualization
requires Flexible I/O - Higher speeds (2.5G, 10G) requires more Efficient
I/O - Issue Best use of Memory and CPU resources
- Additional constraints Limited power, cooling
and smaller form factor
5Convergence Over Ethernet
- Multiple networks and multiple stacks in the OS
are used to provide these services - Wire protocols e.g., Internet Small Computer
System Interface (iSCSI) and iWARP (Remote Direct
Memory Access RDMA) enable the use of Ethernet
as the converged network - Direct Attach storage migrates to Networked
Storage - Proprietary clustering can now use RDMA over
Ethernet - The OS supports one device servicing multiple
stack with the Virtual Bus Driver - To accommodate these new traffic types,
Ethernets efficiency must be optimal - CPU utilization
- Memory BW utilization
- Latency
Sockets Applications
Windows Sockets
Windows Socket Switch
Storage
Applications
RDMA Provider
User Mode
KernelMode
File System
Partition
TCP/IP
Class Driver
NDIS
iSCSI Port Driver
NDIS IM Driver
(iscsiprt sys)
.
iSCSI
RDMA Driver
NDIS Miniport
Miniport
HBA
NIC
RNIC
6Data Center Application Characteristics
7The Server In The Data Center
Database
Application Servers
Web Servers
Cluster
Load Balancers
- Server network requirements Data, Storage,
Clustering, and Management - Acceleration required for Data TCP, Storage
iSCSI, Clustering RDMA - Application requirements More transactions per
server, Higher rate, Larger messages (e.g.,
e-mail)
Long Lived connection
Short Lived connection
8Traditional L2 NIC Rx Flow And Buffer Management
- Application pre-posts buffer
- Data arrives at Network Interface Adapter (NIC)
- NIC Direct Memory Access (DMA) data to driver
buffers (Kernel) - NIC notifies Driver after a frame is DMAd
(Interrupt moderation per frame) - Driver notifies Stack
- Stack fetches headers, processes TCP/IP, strip
headers - Stack copies data from driver to Application
buffers - Stack notifies Application
Application
8
1
TCP Stack
5
6
7
Driver
3
4
L2 NIC
2
Minimum of one copy
9iSCSI
- iSCSI provides a reliable high performance block
storage service - Microsoft Operating System support for iSCSI
accelerates iSCSIs deployment - Microsoft iSCSI Software Initiator
- iSCSI HBA
- iSCSI HBA provides for
- Better performance
- iSCSI Boot
- iSER enablement
Storage
Applications
File System
Partition Manager
Class Driver
iSCSI Port Driver
(iscsiprtysys)
iSCSI
Miniport
HBA
10The Value Of iSCSI Boot
- Storage consolidation lower TCO
- Easier maintenance, replacement
- No need to replace server blade for a HD failure
- No disk on blade/motherboard space,
power savings - Smaller blades, higher density
- Simpler board design, no need for HD specific
mechanical restrainer - Higher reliability
- Hot replacement of disks if a disk fails
- RAID protection over boot disk
- Re-assign disk to another server in case of
server failure
11WSD And RDMA
- Kernel by pass attractive for High Performance
Computing (HPC), Databases, and any Socket
application - WSD model supports RNICs with RDMA over Ethernet
(a.k.a., iWARP) - As latency improvements are mainly due to kernel
by-pass, WSD is competitive with other RDMA-based
technologies, e.g., Infiniband
Traditional Model
WSD Model
Socket App
Socket App
WinSock
WinSock
WinSock Switch
WinSock SPI
TCP/IP WinSock Provider
TCP/IP WinSock Provider
RDMA service Provider
User Kernel
Microsoft WSD Module
TCP/IP Transport Driver
TCP/IP Transport Driver
RDMA provider Driver
OEM WSD Software
NDIS
OEM SAN Hardware
NDIS Miniport
SAN NDIS Miniport
Private interface
NIC
RNIC
12L2 Technology Cant Efficiently Handle iSCSI And
RDMA
- iSCSI HBA implementation concerns
- iSCSI Boot
- Digest overhead CRC-32C
- Copy overhead Zero Copy requires iSCSI protocol
processing - RDMA RNIC implementation concerns
- Throughput high Software overhead for RDMA
processing - MPA CRC-32C, Markers every 512B
- DDP/RDMA protocol processing, zero copy, User
mode interaction, special queues - Minimal latency Software processing doesnt
allow for kernel bypass - Thus, for optimal performance specific offload
is required
13Convergence Over Ethernet TOE, iSCSI, RDMA,
Management
- Converges functions
- Multiple functions (SAN, LAN, IPC, Mgmt.) can be
consolidated to a single fabric type - Blade server storage connectivity (low cost)
- Consolidates ports
- Leverage Ethernet pervasiveness, knowledge, cost
leadership and volume - Consolidate KVM over IP
- Leverage existing Standard Ethernet equipment
- Lower TCO one technology for multiple purposes
14C-NIC Demo
15C-NIC Hardware Design Advantages/Challenges
- Performance wire speed
- Find the right split between Hardware and
Firmware - Hardware for Speed e.g., connection look up,
frame validity, buffer selection, and offset
computation - Hardware connection look up is significantly more
efficient than software - IPv6 address length (128-bits) exacerbates it
- Flexibility
- Firmware provides flexibility, but maybe slower
than hardware - Specially optimized RISC CPU its not about
MHz - Accommodate future protocol changes e.g., TCP
ECN - Minimal latency
- From wire to application buffer (or from
application to wire for Tx) - Not involving the CPU
- Flat ASIC architecture for minimal latency
- Scalability 1G, 2.5G, 10G
- Zero Copy architecture a match to server memory
BW and latency additional copy or few copies in
any L2 solution - Power goals under 5W per 1G/2.5G, under 10W per
10G - CPU consumes 90W
16C-NIC Software Design Advantages/Challenges
- Virtual Bus Driver
- Reconcile requests from all stacks
- Plug and Play
- Reset
- Network control and speed
- Power
- Support of multiple stacks
- Resource allocation and management
- Resource isolation
- Run time priorities
- Interfaces separation
- Interrupt moderation per stack
- Statistics
17Summary
- C-NIC Advantages
- TCP Offload Engine in Hardware for better
application performance, lower CPU, and
improved latency - RDMA for Memory BW and ultimate Latency
- iSCSI for networked storage and iSCSI Boot
- Flexible and Efficient I/O for the data center of
today and tomorrow
18WinHEC 2005
- Brian Hausauer
- Chief Architect
- NetEffect, Inc
- BrianH_at_NetEffect.com
19Todays Data Center
20Datacenter Trends Traffic Increasing 3x
Annually
Application requirements
pervasive standard, plug-n-play interop
concurrent access, high throughput, low overhead
fast access, low latency
5.2 10.2 Gb/s
2.3 4.6 Gb/s
6.5 14.0 Gb/s
Typical for each server
Sources 2006 IA Server I/O Analysis, Intel
Corporation Oracle
- Scaling 3-fabric infrastructure expensive and
cumbersome - Server density complicates connections to three
fabrics - Successful solution must meet different
application requirements
21High Performance ComputingClusters Dominate
Clusters in Top 500 Systems
300
- Clusters continue to grow in popularity and now
dominate the Top 500 fastest computers
294 clusters in top 500 computers
250
200
Ethernet is the interconnect for over 50 of the
top clusters
150
- Ethernet continues to increase share as the
cluster interconnect of choice for the top
clusters in the world
100
50
0
1997
1998
1999
2000
2001
2002
2003
2004
Ethernet-based Clusters
All Other Clusters
Source www.top500.org
22Next-generation Ethernetcan be the solution
- Why Ethernet?
- pervasive standard
- multi-vendor interoperability
- potential to reach high volumes and low cost
- powerful management tools/infrastructure
- Why not?
- Ethernet does not meet the requirements for all
fabrics - Ethernet overhead is the major obstacle
- The solution iWARP Extensions to Ethernet
- Industry driven standards to address Ethernet
deficiencies - Renders Ethernet suitable for all fabrics at
multi-Gb and beyond - Reduces cost, complexity and TCO
23Overhead Latency in Networking
Sources
Solutions
Packet Processing
Intermediate Buffer Copies
Command Context Switches
application
user
CPU Overhead
I/O cmd
100
I/O library
kernel
60
server software
TCP/IP
40
software
hardware
I/O adapter
TCP/IP
- User-Level Direct Access / OS Bypass
24Introducing NetEffects NE01 Ethernet Channel
Adapter (ECA)
- A single chip supports
- Transport (TCP) offload
- RDMA/DDP
- OS bypass / ULDA
- Meets requirements for
- Clustering (HPC, DBC,)
- Storage (file and block)
- Networking
- Reduces overhead up to 100
- Strategic advantages
- Patent-pending virtual pipeline and RDMA
architecture - One die for all chips enables unique products
for dual 10Gb / dual 1Gb
25Future Server Ethernet Channel Adapter (ECA) for
a Converged Fabric
Server
O/S Acceleration Interfaces
Existing Interfaces
ClusteringOS/driver s/w
Block Storage OS/driver s/w
Networking OS/driver s/w
TCP Accelerator
(WSD, DAPL, VI, MPI)
iSER
iSCSI
NetEffect ECA
Ethernet fabric(s)
- NetEffect ECA delivers optimized file and block
storage, networking, and clustering from a single
adapter
26NetEffect ECA Architecture
host interface
Accelerated Networking
Basic Networking
Block Storage
Clustering
crossbar
...
MAC
MAC
27NetEffect ECA ArchitectureNetworking
TCP Accelerator
WSD, SDP,
Sockets (SW Stack)
- Related software standards
- Sockets
- Microsoft WinSock Direct (WSD)
- Sockets Direct Protocol (SDP)
- TCP Accelerator Interfaces
Basic Accelerated Networking
iWARP
TOE
Basic Networking
28NetEffect ECA Architecture Storage
iSCSI, NFS
iSER, R-NFS
Connection Mgmt
Block Storage
- Related software standards
- File system
- NFS
- DAFS
- R-NFS
- Block mode
- iSCSI
- iSER
iWARP
TOE
Basic Networking
Setup/Teardown and Exceptions only
29NetEffect ECA Architecture Clustering
MPI, DAPL,
Connection Mgmt
Clustering
- Related software standards
- MPI
- DAPL API
- IT API
- RDMA Accelerator Interfaces
iWARP
TOE
Basic Networking
N/A
Setup/Teardown and Exceptions only
30Tomorrows Data CenterSeparate Fabrics for
Networking, Storage, and Clustering
Users
LAN iWARP Ethernet
LAN Ethernet
NAS
Applications
Storage iWARP Ethernet
Block Storage Fibre Channel
networking storage clustering
networking
adapter
switch
networking storage clustering
storage
adapter
? ? ? ? ? ? ?
SAN
Clustering Myrinet, Quadrics, InfiniBand, etc.
Clustering iWARP Ethernet
networking storage clustering
clustering
adapter
switch
? ? ? ? ? ? ?
31Fat Pipe for Blades StacksConverged Fabric for
Networking, Storage Clustering
Users
LAN iWARP Ethernet
NAS
Applications
Storage iWARP Ethernet
switch
? ? ? ? ? ? ?
SAN
Clustering iWARP Ethernet
switch
? ? ? ? ? ? ?
32Take-Aways
- Multi-gigabit networking is required for each
tier of the data center - Supporting multiple incompatible network
infrastructure is becoming increasingly more
difficult as budget, power, cooling and space
constraints tighten - With the adoption of iWARP, Ethernet for the
first time meets the requirements for all
connectivity within the data center - NetEffect is developing a high performance iWARP
Ethernet Channel Adapter that enables the
convergence of clustering, storage and networking
33Call to Action
- Deploy iWARP products for convergence of
networking, storage and clustering - Deploy 10 Gb Ethernet for fabric convergence
- Develop applications to RDMA-based APIs for
maximum server performance
34Resources
- NetEffect
- www.NetEffect.com
- iWARP Consortium
- www.iol.unh.edu/consortiums/iwarp/
- Open Group Authors of ITAPI, RNIC PI Sockets
API Extensions - www.opengroup.org/icsc/
- DAT Collaborative
- www.datcollaborative.org
- RDMA Consortium
- www.rdmaconsortium.org
- IETF RDDP WG
- www.ietf.org/html.charters/rddp-charter.html
35Community Resources
- Windows Hardware and Driver Central (WHDC)
- www.microsoft.com/whdc/default.mspx
- Technical Communities
- www.microsoft.com/communities/products/default.msp
x - Non-Microsoft Community Sites
- www.microsoft.com/communities/related/default.mspx
- Microsoft Public Newsgroups
- www.microsoft.com/communities/newsgroups
- Technical Chats and Webcasts
- www.microsoft.com/communities/chats/default.mspx
- www.microsoft.com/webcasts
- Microsoft Blogs
- www.microsoft.com/communities/blogs