Title: IP Storage Tutorial
1IP Storage Tutorial
- Presented 17 October 2001 by
- Marc Staimer, President CDS Dragon Slayer
Consulting - Ahmad Zamer, Sr. Product Line Marketing Intel
- John Hufferd, Sr. Technical Staff IBM SSD
- Joe Gervais, Director Product Marketing
Alacritech
2Tutorial Introduction
- Marc Staimer, CDS Dragon Slayer Consulting
- marcstaimer_at_earthlink.net
3The Purpose of this Tutorial
- IP Storage as block vs. file storage
- NAS will be discussed peripherally
- To provide details about IP Storage
- To provide factual information
- To clarify issues
- To facilitate understanding
- Key point
- This is will be pragmatic education not
cheerleading
4IP Networked Storage
Ahmad Zamer Ahmad.zamer_at_intel.com October 2001
5Overview
- Introduction
- Benefits of IP Storage
- IP Storage technologies
- iSCSI
- Conclusions
6Introduction
- Ethernet wins. Again. In time Ethernet will
eventually triumph over all other storage
networking technologies, including Fibre Channel - Source March 2001 Forrester Research
- If we were starting with a clean piece of paper
we would probably use gigabit Ethernet and IP
Source Bill Miller CTO StorageNetworks,
Industry Standard -
- ... 76 of senior IT executives believe IP
will make it easier to implement large-scale
storage networks -
Source Enterprise
Storage Group 9/11/2000 - 75 perceive iSCSI as the IP storage standard
- Source Marc Staimer , Dragon Slayer
Consulting May 2001
7Network Storage Models
8Moving from Dedicated to Networked Storage
9Benefits of IP Storage
- Brings the SAN concept to Ethernet networks
- Lower total cost of ownership
- Creates a single integrated network
- Makes remote data replication possible
- Improves enterprise networks management
- Provides higher degree of interoperability
10Advantages of IP Storage
- Storage access over distance
- Transparent to Applications
- Leverage Benefits of IP
- IT Skills
- Ethernet SCSI Infrastructure
- Network Management
- RD Investment
- Universal Access to Storage
11Key Business Trends Favor IP Storage
Network Performance
Overall System Cost
Trained Staff Available
Total Cost of Ownership
12IP Storage Standards
- IETF IP Storage (IPS) Working Group
- iSCSI
- FCIP
- iFCP
- iSNS
- Storage Networking Industry Association (SNIA)
- SNIA IP Storage Forum
13IP Storage Technologies
14What are the technologies? (iSCSI, iFCP, FCIP)
- iSCSI
- iSCSI is a TCP/IP-based protocol for establishing
and managing connections between IP-based storage
devices, hosts and clients - FCIP
- FCIP is a TCP/IP-based tunneling protocol for
connecting geographically distributed Fibre
Channel SANs transparently to both FC and IP - iFCP
- iFCP is a TCP/IP-based protocol for
interconnecting Fibre Channel storage devices or
Fibre Channel SANs using an IP infrastructure in
place of Fibre Channel switching and routing
elements
15IP Storage iSCSI, FCIP, iFCP
End Devices
Fabric Services
iSCSI
iSCSI/IP
InternetProtocol
FCIP
FibreChannel
FibreChannel
iFCP
FibreChannel
InternetProtocol
Fabric Services include routing, device
discovery, management, authentication,
inter-switch communication
16iSCSI, iFCP and FCIP Protocol Stacks
Applications
Operating System
Standard SCSI Command Set
New Serial SCSI
FCP FC-4
FCP FC-4
FC Lower Layers
TCP
TCP
TCP
IP
IP
IP
iSCSI
iFCP
FCIP
17iFCP
18iFCP
- iFCP is a gateway-to-gateway protocol for
implementing a fibre channel fabric over a TCP/IP
transport - Traffic between fibre channel devices is routed
and switched by TCP/IP network - The iFCP layer maps Fibre Channel frames to a
predetermined TCP connection for transport - FC messaging and routing services are terminated
at the gateways so the fabrics are not merged to
one another - Dynamically creates IP tunnels for FC frames
//
19iFCP Approach
iFCP provides F port to F port connectivity only
FC Server
FC Tape Library
FC Server
FC Tape Library
Device-to-DeviceSession
iFCPGateway
iFCPGateway
iFCPGateway
iFCPGateway
IP Network
iSNS Server
iSNS Server
iFCPGateway
iFCPGateway
iFCPGateway
Device-to-DeviceSession
iFCPGateway
FC Server
FC JBOD
FC Server
FC JBOD
IP Services at individual device level IETF
Standards for Routing, Naming,Security, QoS,
CoS, Discovery (iSNS)
20FCIP
21FCIP
- FCIP encapsulates FC frames within TCP/IP,
allowing islands of FC SANs to be interconnected
over an IP-based network - TCP/IP is used as the underlying transport to
provide congestion control and in-order delivery
FC Frames - All classes of FC frames are treated the same as
datagrams - End-station addressing, address resolution,
message routing, and other elements of the FC
network architecture remain unchanged - IP introduced exclusively as a transport protocol
for an inter-network bridging function - IP is unaware of the Fibre Channel Payload and
the FC fabric is unaware of IP
//
Ethernet Header
CRC
IP
TCP
FCIP
SCSI Data
FCP
Checksum
22FCIP ApproachIP Tunneling
FC Server
FC TapeLibrary
FC TapeLibrary
FC Server
FC Switch
FC Switch
FC Switch
FC Switch
FCIPTunnel
FCIPTunnel
Fibre Channel SAN
Fibre Channel SAN
IP Network
Tunnel Session
FC Switch
FC Switch
FC Switch
IP Services Available at AggregatedFC SAN Level
FC Server
FCJBOD
FC Server
FCJBOD
FCIP provides E port to E port connectivity
23iSCSI
24iSCSI
- iSCSI is a SCSI transport protocol for mapping of
block-oriented storage data over TCP/IP networks - The iSCSI protocol enables universal access to
storage devices and Storage Area Networks (SANs)
over standard TCP/IP networks
25iSCSI, iFCP, FCiP
26iSCSI Cont.
- iSCSI (Internet SCSI) specifies a way to
encapsulate SCSI commands in a TCP/IP network
connection
TCP Header
IP Header
iSCSI Header
SCSI commands and data
Explains how to extract SCSI commands and data
Provides information necessary to guarantee
delivery
Contain routing information So that the message
can find its Way through the network
27iSCSI Deployment
28iSCSI Implementations
iSCSI Client
Native iSCSI Device
IP Network
iSCSI Gateway
FC Switch
iSCSI Server
Disk
29Storage Consolidation
- Server and LAN bottlenecks
- Single points of failure
- Poor scalability (management overhead, resource
inefficiencies)
- Tape Drives gt Tape Library
- Departmental gt Application-centric disc arrays
30iSCSI Architecture
- Overview
- Architectural Model
- Features Beyond // SCSI
- Issues Beyond // SCSI
31iSCSI - Layered Model
- Replaces shared bus with switched fabric
- Transparently encapsulates SCSI CDBs
- Unlimited target and initiator connectivity
32iSCSI Sessions
iSCSI Device
iSCSI Host
iSCSI Session
iSCSI Initiator
iSCSI Target
TCP Connection
TCP Connection
iSCSI Target
TCP Connection
iSCSI Session
- Session between initiator and target
- One or more TCP connections per session
- Login phase begins each connection
- Deliver SCSI commands in order
- Recover from lost connections
33iSCSI Encapsulation
Data Servers
End Users
iSCSI Target
SCSI Target
Fibre Channel SAN
LUNs
34iSCSI Packet Order
Data Servers
iSCSI Target
SCSI Target
Fibre Channel SAN
LUNs
35iSCSI Packet
//
Ethernet Header
CRC
IP
TCP
iSCSI
SCSI Data
Checksum
36iSCSI Packet
Well-known Ports 21 FTP 23 Telnet
25 SMTP 80 http
iSCSI Encapsulated
Opcode
Opcode Specific Fields
5003 iSCSI
Length of Data (after 40Byte header)
Sourced Port
Destination Port
LUN or Opcode-specific fields
Sequence Number
Initiator Task Tag
Acknowledgment Number
Opcode Specific Fields
Window
Offset
Reserved
Checksum
Urgent Pointer
Data Field
Options and Padding
TCP Header
37iSCSI Commands
- SCSI Commands
- Command phase
- Optional data phase
- Response phase
- iSCSI Commands
- Binds command phase with associated data into
iSCSI Protocol Data Unit (PDU)
38iSCSI Architecture Features Beyond // SCSI
- Sessions
- Comprises one or more TCP connections used for
fail over and/or link aggregation - Device sharing
- Any host on the network can potentially use the
same iSCSI device - Device scalability
- Hosts can connect to an effectively limitless
number of iSCSI devices
39iSCSI Architecture Issues Beyond // SCSI
- Naming, addressing and discovering
- Security Data Integrity
- Ordering and numbering
- Error handling/recovery
- Networking Overhead
40iSCSI Architecture IssuesNaming, Addressing
Discovery
- // SCSI uses a simple NAD scheme
- Devices discovered by polling the bus
- Devices given unique id between 0 and 15
- iSCSI requires
- Internet addressing
- Location independent naming
- operation beyond firewalls
- multiple addresses to one target
- multiple targets behind one address
- 3rd party commands
- Scalable discovery (poll the Internet??)
41iSCSI Storage Device Discovery Process
- 1) Host driver requests available iSCSI targets
from the SCSI router - 2) SCSI router sends available iSCSI target
names to host - 3) Host logs into iSCSI targets that were
received - 4) SCSI router accepts the login and sends target
identifiers to Host (numbers) - 5) Host queries targets for device information
- 6) Targets respond with device information
- 7) Host creates table of internal devices (/dev/)
42iSCSI Sequence
Target
Initiator
Single TCP Session
TCP port 5003
TCP
This device has already initialized onto the
Fibre Channel
iSCSI Driver
43iSCSI Architecture Issues Security Levels
- 0 None ok in controlled environments
- 1 Initiator and target authentication
- Prevents unauthorized access
- 2 Digests for header and data integrity
- Prevents against man-in-middle, insertion,
modification and deletion - 3 Encryption (IPSEC)
- Prevents against eavesdropping
44iSCSI Architecture Issues Ordering Numbering
- Unlike // SCSI, iSCSI PDUs may
- Arrive out of order (by taking different routes)
- Not arrive at all
- iSCSI requires
- Command numbering
- Ordered delivery over multiple connections
- Status numbering
- Detection of a failed connections
- Data sequencing
- Detection of missing data PDUs
45iSCSI Architecture Issues Error Handling
Recovery
- // SCSI errors incur costly recovery
- Aborted commands target, bus and host resets
- OK, because bus errors are infrequent
- iSCSI errors will be more frequent
- Link failures
- TCP failures
- Bad middle box (firewall, router)
- Does the Internet have a reset option??
46iSCSI Architecture Issues Networking Overhead
- Software iSCSI can achieve near GbE wire speed
but at 100 CPU - Traditional TCP stacks are expensive
- multiple memory copies
- too many interrupts
- checksums calculations
- We needs TCP offload engines (TOE)
47iSCSI - TCP Offload
- Ethernet frame requires additional CPU processing
- Headers must be stripped
- Packets ordered
- Data copied into memory buffers
- CRC checked
48iSCSI Architecture ? Issues ? Networking
- TOE
- The challenge rests on the TOE vendor
- Interrupt host on command boundaries
- Offer zero-copy from NIC to app
- Eliminate TCP reassembly buffer
- Provides true zero-copy
- Requires RDMA or synchronization
- Proposed IETF solutions for framing
- WARP - an RDMA mechanism
- Markers a synchronization mechanism
49Whats Next for iSCSI
- CRC
- SLP (Service Location Protocol)
- Authentication
- Encryption
50Conclusions
51Conclusions
- IP-based storage will proliferate
- Benefits are strong
- Significant players
- Clear need
- Standards will be established
- Work with industry leaders
52Backup
53iSNS
- iSNS (Internet Storage Name Server)
- Provides registration and discovery of SCSI
devices and Fibre Channel-based - In IP-based storage like iSCSI end devices
registered with iSNS - In iFCP, Fibre Channel-based storage end devices
register with iSNS by a iFCP gateway
54iSNS Operation
iSNS server
FC network 1
FC network 2
Local iFCP Portal
Remote iFCP portal
IP Network
Server_1
Server_2
IP address 10.1.2.3
IP address 10.1.2.4
N_port ID 24
N_port ID 24
Problem Two identical N_port IDs Solution
Create new ID (based on IP address N_port ID)
2422
55Tracing an iSCSI Block I/O
Server
iSCSI Appliance
Database Application
Application
iSCSI Appliance Storage
2
1
File I/O requests
Storage I/O Bus
Operating System
Database System
File System
Encapsulation
RAID Host Bus Adapter
Raw Partition Manager
Volume Manager
SCSI Device Driver
SCSI Device Driver
iSCSI Device Driver Layer
De-encapsulation
iSCSI Device Driver Layer
TCP/IPP stack
TCP/IPP stack
Network Interface Card
Network Interface Card
Device specific requests to TCP/IP network Block
I/O / data / storage location
56Challenge 1 - TCP Overhead
Consider a SCSI WRITE command. How many times do
you think the data is copied before eventually
reaching the target HBA?
Application copy-gt Buffer Cache copy-gt TCP/IP
DMA-gt Ether (2 copies 1 DMA) Ether DMA-gt Ring
Buffer copy-gt TCP/IP copy-gt Bridge DMA-gt HBA
(2 copies 2 DMA)
57TCP Overhead (2)
- TCP Processing
- Every TCP connection that is part of an iSCSI
session has processing overhead potential - Connection setup / teardown
- TCP state machine
- Acknowledge, Timeout, Retransmission
- Window management
- Congestion Control
- TCP segmentation
- IP fragmentation
- Checksum calculations
- Partial or Complete TCP Offload mechanisms are
assumed to be required to make iSCSI performance
comparable to FC
58Challenge 2 Framing
- Message Boundaries (The Framing - HW-Issue)
- iSCSI messages have no alignment relationship
with TCP segments - And TCP does not have a built in mechanism for
signaling message boundaries. - IETF considered leverage the urgent pointer for
some time - So how can an iSCSI adapter determine where a
message begins and ends?? - By reading the length field in the iSCSI header
- Determines where in byte stream current message
ends and next begins - NIC must stay in sync with beginning of byte
stream - Works well in a perfect world (Maybe a SAN or LAN
????) - In a MAN/WAN we have issues
- IP Frags leading to out-of-order packet delivery
and/or packet loss - Any middle box may fragment an IP packet until,
sending each along potentially different routes
59Framing (2)
- Message Boundaries Continued
- THE SCENARIO
- An iSCSI header is not received when expected
because the TCP segment that it was part of was
delivered out of order - THE ISSUE
- The receiver does not know where to put the
trailing data packets until the packet with the
header arrives - The different options?
- Drop all packets until the header arrives
- They will be retransmitted
- Buffer packets until the header arrives. Then
re-assemble. - On a 1Gbit WAN link,16MB of buffer memory is
required per TCP connection - On a 10 Gbit WAN link, 125MB of buffer memory
required per TCP connection
60Framing (3)
- Message Boundaries Continued
- THE BAD NEWS
- Dropping packets greatly impacts performance and
significantly increases network congestion - Local buffering is expensive and NIC logic is
complex
61Into SAN View
Storage Management Apps
Hosts
Infrastructure
Targets
62SAN Components
- Server Platforms
- Fibre Channel Host Bus Adapters
- IP Storage NICs (SNICs)
- SAN Software
- Storage Platforms
- RAID subsystems
- JBOD
- Tape subsystems
- SAN Interconnect
- Fibre Channel hubs and switches
- IP Storage switches
- SAN-to-SCSI bridges
- MAN and WAN gateways
63SAN, NAS, iSCSI Comparison
DAS
Computer System
Application
File System
Volume Manager
SCSI Device Driver
SCSI Bus Adapter
Block I/O
SCSI
64(No Transcript)
65Potential Outcomes and Success Probability
66I/O Adapters Data Movers
Intel and other vendors will have ONE Ethernet
Wire for ALL Storage LAN Traffic
67Storage Functions/Applications
- Current Functions/Applications
- Storage Consolidation
- Tape Backup
- Clustering
- Replication
- Disaster Recovery
- New Capabilities with IP Storage
- SAN Extension
- QoS
- Security
68LAN-free Tape Backup
Users
Servers
SAN Switch
RAID
Tape Subsystem
SAN Bridge
- SAN Advantages for LAN-free Tape Backup
- Removes backup traffic from the LAN
- Tape becomes SAN shared resource
- High performance SAN infrastructure
- SCSI attached via SAN bridge
69Remote Backup Application
- Allows customers to move archiving off-site for
higher disaster protection
70Server Clustering
- SAN Advantages for server clustering
- Server access to common storage resources
- Failure of a single server still provides data
access - Scalable to gt 30 servers in a cluster
- Simplified storage resource management
71SAN Extension Replication over WAN
LAN
IP WAN
RAID
- Unified Management of Data Center and WAN storage
routers - Not vulnerable to disruption at a local SAN
- Leverage current infrastructure
- Expandable to iSCSI devices
IP WAN Link (OC-3, T1, etc)
GE, 10GE ( iSCSI, iFCP )
Fibre Channel
SCSI
72TCP/IP Layers
TCP/IP Protocols
TCP/IP layers
OSI Model
7
Process layer
FTP
Telnet
HTTP
SNMP
TFTP
6
5
TCP/IP Connection oriented
UDP Connectionless oriented
Host to host layer
4
Internet layer
3
IP
2
Network access layer
LAN/WAN Ethernet, token ring, ATM, Frame Relay,
FDDI
1