Title: Local
1Local Wide Area NetworkingJohns Hopkins
University Course 635.412.71
- Module 4 Storage Area Networks, Fibre Channel,
High Performance Computing Interconnects
2SANs, Fibre Channel, HPCIntroduction
- As computing network technologies evolve, these
technologies are intersecting in new ways - High Performance Storage Storage Area Networks
(SANs) - High Performance Computing Clusters
- Reasons for the rise of High Performance Storage
SANs - Growth of Storage Needs
- Growth of Physical Storage Capacity
- Need to solve issues of
- Performance
- Reliability
- Cost Efficiency
- Management
- Security
3SANs, Fibre Channel, HPCTerminology
- Common Storage Terminology
- Channel
- Interconnect
- PCI, PCI-X, PCI-e (Peripheral Component
Interconnect) - SCSI/iSCSI (Small Computer Systems Interface)
- ATA/SATA
- RAID (Redundant Array of Inexpensive Disks)
- HBA (Host Bus Adapter)
- DAS (Direct Attached Storage)
- JBOD (Just a Bunch of Disks)
- NAS (Network Attached Storage)
- And, of course SAN
4SANs, Fibre Channel, HPCSo What does a SAN
look like?
5Low-Cost SAN iSCSIIntroduction
- Introduction
- iSCSI (i for internet) is a basic protocol that
allows storage links to traverse TCP/IP-based
networks - Standardized by the IETF in 2004 (RFC 3720)
- Encapsulates block-level SCSI commands for
transport across TCP/IP - An initiator uses iSCSI to access a remote
target (typically a LUN) - Design Goal match performance of existing SCSI
transport - Key Uses
- Storage Virtualization (especially SMB)
- Storage Consolidation
- Disaster Recovery
6Low-Cost SAN iSCSITechnology
- Technical Details (1)
- Encapsulates block-level SCSI (CDB) commands for
transport - Appears as application-level TCP/IP traffic
- Uses TCP and (optionally) IPsec for transport
- Multiple TCP connections can be used for an iSCSI
session - CHAP can be used for initial authentication of a
session - Follows the SCSI Remote Procedure Invocation
model - SCSI Commands carried in iSCSI request PDUs
- SCSI responses, data, status reports carried in
iSCSI response PDUs - Uses DNS, SLP, and iSNS for resource
location/discovery - SLP Service Location Protocol (RFC 2608/4108)
- iSNS internet Storage Naming Service (RFC 4171)
7Low-Cost SAN iSCSITechnology
- Technical Details (2)
- Examples of iSCSI PDUs carrying SCSI Payloads
- SCSI Command/Response
- SCSI Data-Out/Data-In
- Examples of iSCSI PDUs with iSCSI-only payload
- Login Request/Response
- Logout Request/Response
- SNACK Request (Retransmission)
- Example iSCSI PDU -gt Next Slide
8Low-Cost SAN iSCSITechnology
- Example is SCSI DATA-IN PDU (WRITE operation)
9Low-Cost SAN iSCSIImplementation
- Most iSCSI implementations use TCP/IP over
Ethernet - Cost advantages
- Familiarity Ease of Troubleshooting
- Storage infrastructure parallels the rest of LAN
infrastructure - For most low/mid-range applications performance
is not an issue (especially when Jumbo frames
are used) - iSCSI Gateways allow low-cost iSCSI-enabled
servers to talk to high-end Fibre Channel-based
storage resources - Windows 2003, Linux 2.6, VMware have iSCSI
support - Example iSCSI HBA/NIC (Qlogic 4052C)
- 100/1000 Full Duplex Ethernet
- 133-MHz PCI-X
- Full TOE iSCSI Offload
- QoS VLAN enabled
10Fibre ChannelIntroduction
- Originally developed for mainframe
supercomputing environments to connect together
high speed clusters storage - Development began in 1988 under the auspices of
the ANSI T11 committee (device level interfaces)
as a standard in 1994 - Besides its use as a very high bandwidth I/O
channel technology, there is interest in Fibre
Channel as a LAN technology because of its high
speed and unique combination of channel network
oriented properties - Data-type qualifiers for routing data into
specific interface buffers - Link-level constructs designed to support
individual I/O operations - Support for existing I/O interface specifications
(SCSI, HIPPI, etc.) - Full multiplexing capabilities
- Peer-to-peer connectivity between any two ports
in a FC network - Ability to internetwork with other LAN, WAN,
I/O technologies - This reflects the book ca. 2000 but does not
appear to be the case
11Fibre ChannelIntroduction
- Comparison of Fibre Channel with Gigabit Ethernet
and ATM Table 9.1 with updates
12Fibre ChannelArchitecture
- Designed to provide a common, efficient,
high-speed transport to a wide variety of devices
through a single port type - Requirements outlined by the Fibre Channel
Association - Full-duplex links over a fiber pair (one
transmit/one receive) - Bi-directional performance up to 6.4-Gbps on a
single link - Support over distances up to 10 kilometers
- Small connectors for high density applications
- High-capacity utilization with distance
insensitivity - Greater connectivity than existing multi-drop
channels - Broad availability at reasonable cost
- Support for multiple cost/performance levels,
from PCs to clusters - Ability to carry multiple protocols and command
sets - The best way to meet such demanding requirements
was to develop a transport mechanism based on
simple point-to-point links a switching network
13Fibre ChannelTerminology
- Fibre Channel, having a different heritage than
other LAN/WAN technologies, has different
terminology Table 9.2 - Dedicated Connection A circuit guaranteed and
retained by the fabric for two specified N_Ports - Exchange The basic mechanism that transfers
information, consisting of one or more related
non-concurrent sequences in one or both
directions - Fabric The entity that interconnects various
N_Ports attached to it and handle the routing of
frames - Intermix A mode of service that reserves the
full FC capacity for a dedicated (Class 1)
connection but allows the transport of additional
connectionless data if space is available - Node A collection of one or more N_Ports
14Fibre ChannelTerminology (continued)
- Fibre Channel, having a different heritage than
other LAN/WAN technologies, has different
terminology Table 9.2 - Operation A set of one or more (possibly
concurrent) exchanges associated with a logical
construct above the FC-2 layer - Originator The logical function associated with
an N_Ports that initiates an exchange - Port The hardware entity within a node that
performs data communications over a FC link - Responder The logical function in a N_Port
responsible for supporting an exchange initiated
by an originator - Sequence A set of one or more data frames with
a common sequence ID transmitted unidirectionally
from one N_Port to another N_Port, with a
corresponding response, if applicable,
transmitted in response to each data frame
15Fibre ChannelTerminology
- Fibre Channel Elements
- The key elements of a FC network are the end
devices called nodes and the collection of
switching elements called the fabric - Communication between FC-attached nodes consists
of transmission of frames across point-to-point
links or fabric - Each node has one or more N_Ports for connection
to the fabric - Nodes connect to F_Ports on the fabric via
bi-directional point-to-point links - Fabrics can be a single switch or a general set
of switching elements - Frames may be buffered within the fabric, making
it possible for nodes to connect to the fabric at
different data rates - The fabric is a switched architecture, not a
shared access medium, so no MAC issues are
encountered and no MAC sublayer is necessary - The FC network scales easily in terms of ports,
data rate, and distance covered and through its
layered protocol architecture interworks with
existing LAN and I/O protocols
16Fibre ChannelTerminology
- Basic Fibre Channel Architectural Diagram
17Fibre ChannelExample Architecture
18Fibre ChannelProtocol Specifications
- Fibre Channel Protocol Architecture
- The Fibre Channel standard reference model is
organized into five levels Figure 9.3 and Table
9.3 - These are not levels in the strict sense of the
OSI model but are instead functional groupings of
services and/or definitions - The standard does not dictate actual
implementations, relationships between the
levels, or the specific interfaces between levels - Levels FC-0, FC-1, and FC-2 are defined together
in a standard called the Fibre Channel Physical
Signaling Interface (FC-PH) - No final standard has been issued for FC-3
- A number of standards have been developed at FC-4
specifying how Fibre Channel interfaces to
existing LAN and I/O technologies
19Fibre ChannelProtocol Specifications
- Fibre Channel Protocol Architecture (continued)
20Fibre ChannelProtocol Specifications
- Fibre Channel Protocol Architecture (continued)
- Details on the FC-0 level
- A variety of physical media and data rates are
allowed - Data rates 100-Mbps to 3.2-Gbps
- Media fiber optic, coaxial cable, and STP
- Distance 50 m to 10 km depending on data rate
and media - The FC-1 level uses a 8B/10B encoding scheme in
which 8 bits of data from the FC-2 level are
encoded into a 10 bit binary symbol - Note the raw vs. effective speeds quoted
are due to the 8B/10B scheme (a 4-Gbps raw stream
actually carries 3.2-Gpbs of data)
21Fibre ChannelProtocol Specifications
- Fibre Channel Protocol Architecture (continued)
- The FC-2 level is responsible for the
transmission of data between N_Ports, which
requires the following - Addressing of N_Ports
- Permissible topologies of the fabric
- Classes of service
- Segmentation and reassembly of frames as well as
higher level grouping of frames (sequences and
exchanges) - Sequencing, flow control, and error control
- The FC-3 level provides a common set of services
across multiple N_Ports - Striping the process of using multiple ports to
transmit a single data unit in parallel - Hunt groups allows a connection to any
available N_Port in the group - Multicast (and broadcast)
22Fibre ChannelProtocol Specifications
- Fibre Channel Protocol Architecture (continued)
- The FC-4 level defines how other protocols
interoperate with Fibre Channel (specifically
FC-PH) - SCSI a common device interface standard for
computer peripherals - HIPPI a high speed I/O channel used in
mainframe and supercomputing environments - IEEE 802 how IEEE 802 MAC frames map to Fibre
Channel frames - ATM
- IP how to map packets into Fibre Channel frames
(RFC 4338)
23Fibre ChannelPhysical Media and Topologies
- One FC strength is the range of allowed options
for the physical medium, the data rate, and
network topology - Transmission Media
- A special shorthand nomenclature has been
developed for FC media it basically consists of
the following - Speed-Medium-Transmitter-Distance
- FC-0 options are listed in Figure 9.4
- Allowable Media Types
- Fiber Optic both SM and both 50?m and 62.5?m MM
- Coaxial Cable three 75 ohm cable types
specified, a thick RG-6/U, a thinner RG-59/U, a
miniature coax cable 0.1 in diameter - Shielded Twisted Pair two types of 150 ohm
cables are specified for use over short distances
at data rates up to 200-Mbps EIA-568 Type 1 STP
(two shielded twisted pair) or EIA-568 Type 2
STP (four pair STP)
24Fibre ChannelPhysical Media and Topologies
- Topologies
- The most general FC topology is the (switched)
fabric - Four basic topologies are available
point-to-point, fabric, arbitrated loop (no hub),
and arbitrated loop with hub - Point-to-point connects two end nodes with no
switches or routing - The fabric topology can contain an arbitrary
number of switches, some connecting to nodes and
others that just provide transport between other
switches - The fabric topology allows for easy scalability
- In the fabric topology the overhead on nodes is
minimized they are only responsible for managing
the point-to-point link to their local switch - Each port requires a unique address to allow
frames to be delivered to the proper destination
25Fibre ChannelPhysical Media and Topologies
- Topologies (continued)
- The arbitrated loop topology allows up to 126
nodes to be connected in a simple, low-cost loop - The ports on the loop are a special kind called
NL_Ports because they must perform loop
management functions - Operation is roughly equivalent to other token
ring protocols - A token acquisition protocol controlling loop
access is required - The fabric loop topologies can be connected as
long as one node can act as both an arbitrated
loop a fabric node that participates in routing
decisions on the fabric - The topology of a given FC network is discovered
automatically as part of network initialization
26Fibre ChannelPhysical Media and Topologies
- Fibre Channel Topologies (continued)
27Fibre ChannelFraming Classes of Service
- Framing Protocol
- The FC-2 layer defines the rules for the transfer
of frames between nodes, comparable to the OSI
data link layer - FC-2 specifies frame types, procedures for frame
exchange, frame formats, flow control, and
classes of service - FC-2 Classes of Service
- Multiple classes of service are defined by the
way communication is established between two
ports and their flow/error control capabilities - Five classes of service are currently defined
- Class 1 Acknowledged Connection-oriented
service - Class 2 Acknowledged Connectionless service
- Class 3 Unacknowledged Connectionless service
- Class 4 Fractional Bandwidth Connection-oriented
service - Class 6 Unidirectional Connection service
28Fibre ChannelFraming Classes of Service
- FC-2 Classes of Service
- Class 1 Service
- Provides a dedicated path through the fabric
which behaves to the end nodes like a
point-to-point link - Also provides a guaranteed data rate with
sequenced delivery of frames - The end node requests the setup of a Class 1
service connection using a special start-of-frame
delimiter (SOFc1) - Class 1 service is advantageous for long constant
bandwidth transfers of data (e.g. - streaming
backups over a network)
29Fibre ChannelFraming Classes of Service
- FC-2 Classes of Service (continued)
- Class 2 Service
- Provides an acknowledged data transmission
service without connection setup overhead - Acknowledgements frames are returned by the
receiving port, if a delivery cannot be made due
to congestion a busy frame is returned - This is not the case with frames that cannot be
delivered due to frame errors - Sequenced delivery is not guaranteed frames can
take different paths through the fabric if
possible - Multiplexing of frames from different sources
and/or destinations is allowed - Class 2 service is good for Storage Area Networks
(SANs)
30Fibre ChannelFraming Classes of Service
- FC-2 Classes of Service (continued)
- Class 3 Service
- Provides a basic datagram service (no connection
setup) - No guaranteed nor acknowledged delivery
- Good for short data bursts or multicast/broadcast
data - Class 4 Service
- Provides service similar to Class 1 but adds
Quality of Service (QoS) guarantees and
reservations - Allows the specification of guaranteed bandwidth
bounded latency - QoS parameters established separately for each
direction - Good for time-critical real-time applications
(e.g. -- VTC) - Class 6 Service
- Provides the reliable unicast delivery found in
Class 1 but also supports reliable multicast and
preemption - Good for video streaming and broadcasting
31Fibre ChannelFrame Types and Uses
- There are two general types of frames data and
control - Three types of data frames are used to transfer
higher level information between N_Ports - FC-4 Device Data used to transfer higher-layer
data units from protocols specified in FC-4
standards (IP, SCSI, etc.) - FC-4 Video Data used to transmit streamed video
between buffers without an intermediate storage - Link Data used to support higher level control
information between N_Ports - Three types of link control frames are currently
defined - Link Continue functions as an acknowledgement
in Fibre Channel sliding-window based data
transfer - Link Response used as a negative
acknowledgement in FC sliding-window based data
transfer - Link Command A reset command used to
reinitialize the sliding-window based transfer
mechanism
32Fibre ChannelFrames, sequences, and exchanges
- There is much more to the FC-2 layer than frames
classes of service it defines a set of
functional building blocks for higher layer
services - Also defines a number of protocols used to
implement services at a port - Typical protocols are creating or terminating a
connection, transferring data, etc. - Protocols consist of an exchange of information
between N_Ports, which in turn consists of
sequences, and sequences a composed of a related
set of frames
33Fibre ChannelFrames, sequences, and exchanges
(continued)
34Fibre ChannelFrames, sequences, and exchanges
(continued)
- Sequences
- With Fibre Channel a maximum frame size is
imposed at the FC-2 layer but is transparent to
higher layers - Higher layers set down chunks of data to FC-2,
which may need to break them up into a sequence
of frames - The sequence of data frames needed to carry a
single higher-layer chunk of data may also be
accompanied by one or more link control frames
for acknowledgement - FC-2 provides segmentation reassembly that
supports the transmission of sequences as well as
error control - Errors in a frame that belongs to a sequence
causes the retransmission of that whole sequence
(and any others transmitted after it go back N
ARQ)
35Fibre ChannelFrames, sequences, and exchanges
(continued)
- Exchanges
- Exchanges are mechanisms for organizing multiple
sequences into a higher-level construct to allow
easier interfacing to applications - Examples of exchanges are SCSI disk operations
like a read or write - Can involve either a unidirectional or
bi-directional transfer of sequences - Within a given exchange, only a single sequence
can be active (though sequences from different
exchanges can be simultaneously active)
36Fibre ChannelFrames, sequences, and exchanges
(continued)
- Protocols
- An exchange is tied to a protocol that provides a
specific service for higher levels - Some common protocols that may be used by any
higher application - Fabric Login executed upon initialization of an
N_Port, requires the exchange of the N_Port
address, classes of service supported, and
flow-control parameters - N_Port Login the exchange of service parameters
between a pair of N_Ports before data exchange
(buffer space, service classes supported, etc.) - N_Port Logout the termination of a connection
between a pair of N_Ports
37Fibre ChannelFraming Classes of Service
- Flow Control
- Fibre Channel provides a sophisticated set of
flow control mechanisms at two levels
end-to-end and buffer-to-buffer - Key concept is credit -- negotiated at login
denotes the number of unacknowledged frames
allowed at any time - End-to-End Flow Control
- Paces the flow of frames between N_Ports
- Requires acknowledgements to operate, so
end-to-end flow control can be used only with
Class 1 and Class 2 services - Acknowledgement Types (Class 1 or Class 2
service) - ACK_1 ACKs one data frame decrements credit
by 1 - ACK_N ACKs N data frames decrements credit by
N - ACK_0 acknowledges a whole sequence,
decrementing the credit count by the number of
frames in the sequence
38Fibre ChannelFlow Control (continued)
- End-to-End Flow Control (continued)
- Acknowledgement types cannot be mixed if ACK_1
is initially used for a Class 1 connection than
it must be used for the entire connection - Busy Reject control frames are also used for
flow control - The F_BSY frame indicates the fabric is busy and
cannot deliver a frame - The P_BSY frame indicates the destination port is
busy and cannot accept a frame the sender will
try a predefined number of times to retransmit
the frame - With the Reject (F_RJT and P_RJT) frames,
delivery of the data frame is being denied (for
some reason other than congestion) - When a frame belonging to a sequence is rejected
the whole sequence must be retransmitted
39Fibre ChannelFlow Control (continued)
- Buffer-to-buffer Flow Control
- Operates across a pair of ports connected by a
point-to-point link assures that buffers are
available at either end of the link - Applicable to all classes of service (including
Class 3) - A single type of control signal, the R_RDY frame,
is used for buffer-to-buffer flow control - As a data frame is transmitted across the link,
the sender increments its credit count for the
link - At the receiving port the data frame is buffered
as received - Once the data frame is switched to another ports
buffer on the switch, the receiving port sends
back the R_RDY frame to the sending port - When the sending port receives the R_RDY frame it
decrements the credit count, opening its window
by a frame
40Fibre ChannelFraming Classes of Service
- Frame Format Figure 9.10
- The Fibre Channel Frame contains five general
fields - Start Delimiter
- Frame Header
- Data
- Cyclic Redundancy Check (CRC)
- End Delimiter
41Fibre ChannelFraming Classes of Service
- Frame Format - Start of Frame Delimiter
- The start of Frame Delimiter includes a four byte
set of non-data symbols denoting the start of a
frame and allowing synchronization - The SOF delimiter comes in several varieties,
each of which will specify the frames type and
class of service - Examples are SOF Class 1 connection (SOFc1), SOF
normal (for data frames), and SOF fabric (for
control frames in the fabric)
42Fibre ChannelFraming Classes of Service
- Frame Format - FC- 2 Frame Header
- Contains the control data required at this level
consists of - Routing control contains two subfields, one for
frame type (device data, link control, etc.) and
one for data type in the frame - Destination Identifier destination N_Port or
F_Port - FC uses two levels of addressing a globally
unique identifier (world wide port/node names)
a lower level port identifier - World wide/port name is used by higher layers and
for network management - Port identifier is the 3-byte that is used for
frame routing that consists of three parts
domain, area, and port - The hierarchical addressing structure facilitates
routing and management of the fabric - A mechanism for mapping between the two addresses
is necessary
43Fibre ChannelFraming Classes of Service
- Frame Format - FC- 2 Frame Header (continued)
- Contains the control data required at this level
consists of - Source Identifier source N_Port or F_Port
- Type if the routing control field specifies an
FC-4 frame, then this field specifies the payload
protocol (SCSI, IP, etc.) - This field and the Route control field allow the
destination N_Port to deliver the data to the
correct higher layer user - Frame control control information relating to
frame content - Is frame a retransmission? Is frame part of a
sequence? - Sequence ID unique identifier for a sequence
used for all frames belonging to it - Data Field control specifies which, if any, of
four optional headers are present
44Fibre ChannelFraming Classes of Service
- Frame Format - Frame Header (continued)
- Contains the control data required at this level
consists of - Sequence count A unique number assigned
sequentially to each frame in a sequence (for
flow control and proper reassembly of frames
within a sequence) - Originator Exchange Identifier a unique
identifier assigned to the higher layer initiator
of an exchange - Responder Exchange Identifier a unique
identifier assigned to the higher layer
destination of an exchange - Parameter used in different ways for link
control and data frames - Link control frames carry information specific to
the control function in this field - Data frames may carry an address meaningful to
the upper layer protocol
45Fibre ChannelFraming Classes of Service
- Frame Format - Data Field
- Contains user data in a multiple of four bytes
chunks up to a maximum of 2112 bytes - Can also include one or more optional headers
whose presence is denoted in the Data Field
control field - Optional Expiration Security header can carry a
frame expiration date plus other security data
over and above the FC-PH standard - Optional Network Header may be used by a bridge
or gateway node interfacing to an external
network to allow tunneling (includes 8 bit source
and destination network addresses) - Optional Association Header may help specify an
upper layer process (or group of processes)
associated with an exchange - Optional Device Header if used the format is
specified by the upper layer protocol used with
the frame
46Fibre ChannelFraming Classes of Service
- Frame Format - CRC End Delimiter
- CRC field the error detection algorithm is the
same 32 bit CRC used with FDDI and IEEE 802 - End of Frame Delimiter
- A four byte field denoting the end of the frame
- The EOF field may be modified by a switch in the
fabric if it finds an error in the frame or some
other condition that invalidates the frame - There are three different EOF delimiters for
valid frames - EOFt denotes the end of a valid sequence
- EOFdt is used with Class 1 service to indicate
that the frame is the last frame on the logical
connection (i.e. the connection is being
terminated) - EOFn is used to denote successful transmission of
frames not covered by the first two
47Fibre ChannelExamples of Equipment
- Fibre Channel Equipment Manufacturers
- High-end (Director-Class) Switches
- Brocade Silkworm 2400 (http//www.brocade.com/prod
ucts-solutions/products/directors/product-details/
48000-director/index.page ) - Cisco MDS 9513 (http//www.cisco.com/en/US/produc
ts/ps6780/index.html ) - Low-end (Edge) Switches
- EMC DS-300B (http//www.emc.com/collateral/hardwar
e/specification-sheet/h5528-connectrix-ds300b-ss.p
df ) - Cisco MDS 9134 (http//www.cisco.com/en/US/product
s/ps8414/index.html ) - Host-Bus Adapters (HBA)
- HP Storageworks 8-Gbps PCIe HBA
(http//h18006.www1.hp.com/products/storageworks/f
c81q_pci/index.html ) - Qlogic QLA2340 2-Gbps PCI-x (http//www.qlogic.com
/Products/SAN_products_FCHBA_QLA2340.aspx )
48High Performance ComputingIntroduction
- Throughout the 1960s, 1970s, 1980s
Supercomputers defined the high-end of computing
performance - Within the past decade the notion of gluing
collections of lower powered computers together
to harness their collective power has been a
force in the HPC arena - Come in two general forms
- Clusters
- Grids
- Clusters are usually homogenous resources owned
and maintained by a single organization - Grids are usually heterogeneous and dynamic
resources distributed many times shared among
organizations - The key in both situations is network
interconnections!
49High Performance ComputingIntroduction to
Infiniband
- Several high performance network technologies
have been developed for HPC connectivity - Infiniband
- A promising technology in both HPC SAN
environments - Developed by the Infiniband Trade Association a
vendor consortium with 190 members (notably
Dell, Sun, IBM, HP) - Goal is to provide an extensible high-speed,
low-latency interconnection platform that is cost
effective in a number of scenarios - Version 1.0 ratified in Sept 2000, currently at
version 1.2.1 - Specification provides a comprehensive protocol
architecture through the transport layer
(including management)
50High Performance ComputingInterconnects
Infiniband Protocol Architecture
51High Performance ComputingInterconnects
Infiniband
- Physical Layer Layer
- Overall architecture designed on switch-based P2P
links - Individual links based on 4-wire 2.5-Gbps (1x)
full-duplex connection - Higher speed interfaces use multiple 1x links
(e.g. 4x link has 16 wires and runs at 10-Gbps
full-duplex) - Links use 8B/10B encoding 1x throughput is
2-Gbps - Copper Links
- PCB/bus links run at maximum of 30 inches
- TP copper links run up to 17m currently specd
for 4x and 12x - Mechanical connectors and cables defined in
specification - Fiber Links
- SX (850nm MM) and LX (1310nm SM) versions of 1x,
4x, 12x - Use multiple lanes like copper except for 4x-LX
(10GBase-LR) - Variety of Physical Connectors specified (MPO,
SC, etc.)
52High Performance ComputingInterconnects
Infiniband
- Data Link Layer
- Defines Management Data Packets (Max. size of
4kB) - Within a subnet packets forwarded using LID
assigned to interface - Each link has 15 Virtual Lanes (VLs) prioritized
from 0 to 15 - Subnet Management packets have highest priority
(VL-15) - Allows QoS schemes only VL-0 and 15 are
mandatory - Each data packet has a Service Level (SL) used to
map it to a VL - Unicast and Multicast support
- Uses a credit-based flow control scheme
- Two CRCs for comprehensive error detection
- Link-based 16-bit CRC checked hop-to-hop
- End-to-end invariant 32-bit CRC checks fields
that do not change hop-by-hop
53High Performance ComputingInterconnects
Infiniband
- Network Layer
- Defines Management Data Packets (Max. size of
4kB) - Within a subnet packets forwarded using LID
assigned to interface (Max. 65k nodes per subnet) - Outside a subnet packets routed using GRH/GID
- Routers vs. Switches
- Transport Layer
- Defines five different services associated PDUs
- Reliable Connection
- Reliable Datagram
- Unreliable Connection
- Unreliable Datagram
- Raw Datagram
54High Performance ComputingInterconnects
Infiniband
- Application Layer
- A number of service interfaces and verbs are
defined - User interfaces follow the VIA (Virtual Interface
Architecture) specification - Management
- Includes two general management
packages/protocols - Subnet Manager (SM)
- Configures the local subnet provides essential
services - LID assignment, SL-to-VL mapping, Link failover,
etc. - All devices must talk to SM, can have standby SMs
- SM traffic is highest priority (VL-15) with no
flow control - General Services Interface (GSI)
- Other operations like out-of-band mgmt., chassis
mgmt. - Lower priority and subject to flow control
55High Performance ComputingInterconnects
Infiniband
- Product Examples (Low-End) Cisco SFS-7000P
- Provides 24 fixed Infiniband 4x ports (10-Gbps)
in 1U enclosure - 480-Gbps switch fabric, non-blocking, 200ns
latency - Integrated Subnet Manager, ITBA v1.2 compliant
- Managed via Web, Java clients, or command line
- List price (2/2006) 14,495
56High Performance ComputingInterconnects
Infiniband
- Product Examples (High-End) Voltaire ISR 9288
- Up to 288 Infiniband 4x ports in a 14U modular
enclosure - Up to 11.5-Tbps bandwidth non-blocking with 420ns
latency - Integrated subnet manager, comprehensive mgmt.
packages - Multiple redundant components failover options
- ITBA v1.1 compliant
- GSA price 43,000 for 24 ports
57High Performance ComputingInterconnects
Infiniband
58High Performance ComputingOther choices for
interconnects
- Other Alternatives for Interconnect Technologies
- Myrinet
- ANSI standard protocol architecture with
limited support beyond one company (Myricom) - Low-latency, full-duplex switch fabric 2
10-Gbps links - Older Copper (HSSDC, LAN, SAN) connections
fiber preferred - Comprises 2 of current Top500 list (28 for
Infiniband) - Q-net/Quadrics (1 of Top500)
- Proprietary high-speed (900-MBps) low latency
interconnect - Copper/Fiber connections up to 100m
- Fat-tree switch fabric up to 4096 nodes
- Gigabit/10-Gigabit Ethernet (56 of Top500)
- The interconnect of choice because of cost
convenience - Latency can be an issue so can frame size
59SANs, Fibre Channel, HPC InterconnectsReading
- Reading
- This modules material Stallings chapter 9
- Next module Last-Mile Technologies