Title: Effective Strategies for SAN Performance Monitoring
1Effective Strategies for SAN Performance
Monitoring
with PerformanceVSN
NTSMF Users Group - CMG
- David Signori
- Product Marketing Manager, Software Solutions
- INRANGE Technologies Corporation
- 12/9/02
-
2Current Challenges in Storage Networking
Administration
- Planning network requirements for Business
Continuance applications. - Planning network requirements for the
ever-increasing size and complexity of the
storage environment. - Lowering management cost while increasing storage
networking performance - Implementing a Service Provider model consisting
of charge back, reporting, and service level
agreements to end users. - Eliminating finger pointing with Server, Network,
and Database administration groups. - Managing heterogeneous environments.
- Decreasing or eliminating downtime.
Ultimately, how do I increase and guarantee
performance while lowering cost?
3Storage Networking Performance Monitoring Solution
- Requirements
- Session Layer Traffic Flow Monitoring
- External to the Storage Networking Equipment
- Standards-based Management, Collection, and
Reporting Interfaces - Simple Plug-and-Play Configuration and Operation
- Persistence Permanent Records of Traffic
Behavior - Flexible Reporting Capabilities
- Policy Monitoring and Alerting
- Enhance Storage Network Security
- Scalable
A Comprehensive Storage Networking Performance
Monitoring Solution will increase performance and
lower cost.
4What is PerformanceVSN?Product Overview
- Definition
- INRANGE Storage Networking Performance Monitoring
Solution for Capacity Planning and Service Level
Management. - Components
- PerformanceVSN Server (Appliance)
- PerformanceVSN Server Software
- Optional PerformanceVSN Probe
- Base Functionality
- PerformanceVSN Server Server Software
- Port-level statistics collection both real-time
and historical - Statistics gathered from INRANGE Directors
switches - Advanced Functionality
- PerformanceVSN Server Server Software
Probe(s) - Session-level statistics collection both
real-time and historical - Statistics gathered from INRANGE Directors
switches Probe(s)
PerformanceVSN Server
PerformanceVSN Probe
5Performance Monitoring Requirements
Session Layer Traffic Flow Monitoring
LUNs 1..n
RAID_A
ISL
LUNs 1..n
Server_A
Server_B
Server_C
RAID_B
Server_D
Server_E
LUNs 1..n
Server_F
Server_G
Session statistics Total ISL
utilization 60 Server_A to
RAID_B util 35 Server_A
to RAID_B / Lun 3 util 10
Server_A to RAID_B / Lun 9 util 15
Server_A to RAID_B / Lun 5 util
10 Server_B to RAID_C util 25
Server_B to RAID_C / Lun 2
util 22 Server_B to
RAID_C / Lun 7 util 3
RAID_C
Port statistics ISL at 60 utilization
Port vs. Session Layer Statistics
6Performance Monitoring Requirements
FICON Layer 2 Session Layer Traffic Flow
Monitoring
Server_A
FICON_Storage_A
Channel_A1
CU_A1
Channel_A2
CU_A2
Server_B
Channel_B1
FICON_Storage_B
CU_B1
Channel_B2
CU_B2
Channel_C1
Server_C
Channel_C2
Session statistics Total CU_B2
utilization 60 Channel_A1 to
CU_B2 util 35 Channel_B2 to
CU_B2 util 20 Channel_C1 to
CU_B2 util 5
Port statistics CU_B2 60 utilization
7FICON Cascading High Integrity Fabric
Server_A
FICON_Storage_A
Channel_A1
CU_A1
Channel_A2
CU_A2
Server_B
Channel_B1
FICON_Storage_B
CU_B1
Channel_B2
CU_B2
Channel_C1
Server_C
FICON_Storage_C
Channel_C2
CU_C1
CU_C2
Channel_D1
Server_D
Channel_D2
Session statistics Total ISL
utilization 60 Channel_D1 to
CU_B2 util 35 Channel_A2 to
CU_C1 util 20 Channel_C1 to
CU_C2 util 5
Port statistics ISL 60 utilization
8Performance Monitoring
FICON ULP Session Layer Traffic Flow Monitoring
Server_A
FICON_Storage_A
Channel_A1
CU_A1
Channel_A2
CU_A2
Server_B
Channel_B1
FICON_Storage_B
CU_B1
Device_B2A1
Channel_B2
Device_B2A2
CU_B2
Device_B2B1
Device_B2B2
Channel_C1
Server_C
Device_B2B3
Channel_C2
Session statistics Total CU_B2
utilization 60 Channel_A1 to
CU_B2 util 35
Channel_A1 to CUADD_B2B util 20
Channel_A1 to Device _B2B1 util
15 Channel_A1 to
Device_B2B3 util 5
Channel_A1 to CUADD_B2A util 15
Channel_A1 to Device_B2A1 util 10
Channel_A1 to Device_B2A2 util 5
Channel_B2 to CU_B2 util 20
Channel_C1 to CU_B2 util 5
Port statistics CU_B2 60 utilization
9Session Layer ReportingExamples
- Real-time Summary of the Selected LUNs in SCSI
Read Mbytes/Sec being currently accessed by all
hosts. - Note that this is a system wide report across all
servers on the network.
10Session Layer ReportingExamples
- Real-time Summary of the Top 5 LUNs in Total
Mbytes/Sec being currently accessed by Host
Server_A. - Note LUNs 9, 5, 7, 6, and 8 on storage device
RAID_A
11Session Layer ReportingExamples
- Real-time Summary of the Top 5 LUNs in Read
Duration for Host Server_A. - Note that this is a measure of latency and is
reporting on the 5 LUNs in which latency is a
maximum the network.
12Session Layer Reporting Examples
- Trend of SCSI Exchanges/Sec between host
Server_A and storage device RAID_A for the
past 2 hours.
13Performance Monitoring Requirements
External to Storage Networking Devices
- Resources in network devices should be dedicated
to the distribution and handling of incoming and
outgoing data streams. - Many potential problems at the framing and upper
layers are not reported. - Although external, probe should be non-intrusive
Servers
Servers
Metro Disk Mirroring
Remote Storage
Remote Storage
WAN Disk Mirroring
WAN
Storage
Performance Monitoring Probe
14Performance Monitoring Requirements
Standards Based
Reporting
Management
SAN Management, Data Management, Virtualization, S
RM, Enterprise Management
Java GUI, Spreadsheets, SAS, Home grown
SNMP, CIM/XML
CSV, SQL, HTTP
Performance Monitoring Platform
TCP/IP
SNMP Fibre Alliance MIB
3rd Party Devices
Routers/ Channel Extension
Switches/Directors
Probes
Collection
15Performance Monitoring Requirements
Standards Based
- Should Support Heterogeneous Environments
- Multi-Vendor Equipment
- FICON, FCP, IP, and VI
- Fibre Channel and WAN
- Should Support Standalone Deployment or as a
Plug-In to Chosen SAN Management Application - Adds value to chosen storage management
applications - Should Function as a Plug-In to Chosen Enterprise
Management System. - Should Leverage Performance Monitoring
Capabilities in Existing Equipment Metrics and
Access - Service Provider-Type Reporting
16Performance Monitoring Requirements
Simple Plug-and-Play Configuration and Operation
- Should Support Topology Rollup and Automatic
Discovery of ports, devices, and LUNs. - Session and SCSI layer monitoring should be
reported by human-readable logical port and
device names - Permanent Statistics Logging should start
automatically and have easily configurable
sampling periods - Should Support a Dashboard for Quick Health
Assessment - Should Support Open Systems Management for Remote
and Desktop Access.
17Performance Monitoring Requirements
Persistence Permanent Records of Traffic Behavior
- Should support user-configurable historic
sampling intervals - Should support user-configurable rollup periods
and retention times for efficient database usage - Should support archival and export of database
for long term capacity planning - Persistent statistical storage enables capacity
planning and trouble-shooting of problems that
occurred in the past - Should support historical trend reports for
capacity planning and performance tuning. - Should support historical summaries for Service
Provider-Type Reporting. - Should support bookmarks and pre-configured time
durations for frequently viewed reports and
Service Provider-Type Reporting
18Performance Monitoring Requirements
Persistence Permanent Records of Traffic
Behavior Examples
- Trend of Total Mbytes/Sec In and Out for a
selected port over the past 2 hours - Note that report was requested at 1830 and
displayed historical data. This is not a trace
that began at 1630.
19Performance Monitoring Requirements
Persistence Permanent Records of Traffic
Behavior Examples
- Trend of Total Mbytes/Sec In and Out for a
selected port over the past 8 hours - Note that in addition to customized time periods,
pre-configured time periods like Today,
Yesterday, Current Week, and Last Month should be
possible.
20Performance Monitoring Requirements
Persistence Permanent Records of Traffic
Behavior Examples
- Trend of SCSI Exchanges/Sec between host
Server_A and storage device RAID_A for the
past 2 hours.
21Performance Monitoring Requirements
Persistence Permanent Records of Traffic
Behavior Examples
- Summary of the Top 5 LUNs in Total Mbytes/Sec
being currently accessed by Host Server_A for
Month of May, 2002 - Note LUNs 9, 5, 7, 6, and 8 on storage device
RAID_A
22Performance Monitoring Requirements
Flexible Reporting Capabilities
- Should Support Real-Time Monitoring
- Should Support Collection of Hundreds of Metrics
including Diagnostics - Should Include Value-Added Derived Reports like
TopN, Rates, and Multiple Devices and Statistics
in a Single Report - Should Support Configurable Sampling Intervals
- Should Support Bookmarks to Easily Return to
Frequently Viewed Reports.
23Performance Monitoring Requirements
Flexible Reporting Capabilities Hundreds of
Metrics, Examples
- Utilization
- Frames (In/Out)
- FC-2 MB/Sec (In, Out)
- FC-4 MB/Sec (In, Out by ULP SCSI, IP, VI,
FICON, and others) - Errors MB/Sec (In, Out)
- SCSI IO/Sec (Read, Write, Other)
- SCSI Read (avg, min, max, read percentage)
- SCSI Write (avg, min, max, write percentage)
- SCSI Other (other percentage)
- SCSI Read/Write Payload Size Ranges (percentage)
- Throughput Errors
- Busy Frames
- Rejected Frames
- Link Failures
- Aborts
- Primitive Seq Protocol Errors
- Invalid Tx Words
- Delimiter Errors
- Discarded Frames
- BSYs and RJTs (Port, Fabric)
- CRC Errors
- Availability
- Link Resets (In/Out)
- OLS (In/Out)
- LOGIs (Port, Fabric)
- Available
- Link Integrity
- Sync Loss
- Sig Loss
- Capacity
- capacity for all frames
- FC-4 capacity (SCSI,IP,VI,FICON, other)
- capacity link control
- capacity link services
- Latency
- SCSI Read/Write Duration (ms)
24Performance Monitoring Requirements
Flexible Reporting Capabilities Examples
- Real-time Summary of Total Mbytes/Sec for 24
selected ports. - Note that multiple ports across multiple switches
can be added to single report. - Note Report is accessed using a Bookmark
25Performance Monitoring Requirements
Flexible Reporting Capabilities Examples
- Real-time Summary of percent read exchange size
to storage device RAID_A from all hosts on the
network. - Real-time sampling interval can be modified.
- Report can be toggled to trend by simply
selecting tool bar button. - Multiple metrics in a single report
26Performance Monitoring Requirements
Policy Monitoring and Alerting
- Should support proactive troubleshooting to
eliminate or decrease downtime - Should support open real time alerting (i.e.
SNMP, Email) - Should support multiple levels of thresholds
- Should support pre-defined threshold definitions
for quick and easy configuration - Thresholds should be supported on all metrics
collected including errors, type of traffic, size
of traffic, etc and all objects including
ports, devices, and logical units - Ideal for Service Provider Model since
administrator knows about potential problems
before end-user.
27Performance Monitoring Requirements
Enhanced Security Policies
- Role-Based Security
- Event Logging
- Security Policy Monitoring Alerting on
unauthorized Host to LUN access
28Performance Monitoring Requirements
Scalability
- Should Support a Combination of Software and
Hardware to Suits your needs. - Should Support an Inexpensive Entry Point that is
easily Expandable as your Network Grows. - Should Support a Roadmap around Future Storage
Networking requirements (i.e. 10G, FC-IP, iSCSI,
Infiniband) - Should be Data Center ready (i.e. multiple
interfaces in a single enclosure, rack-mountable)
29Performance Monitoring Life-Cycle
Putting it all together
- Performance Profiling
- Record and Monitor Current Network Performance
Levels - Performance Thresholding
- Set Thresholds based on profiles for real-time
alerting to throughput and availability problems. - Performance Tuning
- Adjust traffic flows based on profiles for better
network performance without spending for more
resources. - Capacity Planning
- Know exactly when and how much more resources are
needed without overspending.
30Case Study and ROI
Large Financial Brokerage Metro Area Disk
Mirroring
Storage
Servers
Remote Storage
FICON
FICON
FICON
FCP
FCP
FCP
FCP
DWDM
31Case Study and ROI
Performance Profiling
MAN extender usage across a selected week. Note
spikes in traffic.
32Case Study and ROI
Performance Profiling
Drilling into MAN extender usage across for
specific day. Note spike in traffic between noon
and 1PM.
33Case Study and ROI
Performance Tuning
Drilling into Storage port usage identifies
offending Storage Device
34Case Study and ROI
Large Financial Brokerage Metro Area Disk
Mirroring
- Given
- DWDM Channel costs 16k/month.
- Customer was considering going to 4 channels per
fabric but justified that for time being, 3 per
fabric was adequate. - Result
- ROI was less than 2 months for this particular
solution.
- Additional Benefits
- Capacity Planning
- Visibility into utilization trends determine
exactly when additional channels will be needed. - Performance Tuning
- Visibility into offending storage device provide
load balancing feedback to re-map devices to
lower utilized links thus optimizing channels. - Standards-Based
- Provides seamless visibility into the FICON
portion of the fabric as well. - Real-Time Monitoring
- Reports on errors for trouble-shooting and
diagnostics.
35Performance Monitoring Solutions to Current
Challenges
- Planning network requirements for Business
Continuance applications - Planning network requirements for the
ever-increasing size and complexity of the
storage environment
- Answers question of how many MAN extender links
you need. - Answers question of how much WAN extender
bandwidth you need. - Traces spikes in MAN/WAN extender link back to
the device and volume that caused it. - Enables you to know when you will need more
bandwidth. - Reports on Latency
- Answers question of how many ISLs you need.
- Answers question of what is the optimum
server-to-storage ratio. - Enables you to know when you will need more
ports. - Traces spikes in ISL and storage port back to the
device and volume (LUNs) that caused it.
36Performance Monitoring Solutions to Current
Challenges
- Lowering management cost while increasing storage
networking performance - Implementing a Service Provider model consisting
of charge back, reporting, and service level
agreements to end users. - Eliminating finger pointing with Server, Network,
and Database administration groups.
- Reports, both real-time and historical, are only
a mouse click away. No need for tedious
spreadsheet crunching. - Command line launch and open APIs for seamless
integration with 3rd party storage management
application.
- Since Session Layer Monitoring correlates usage
and errors to the individual server, storage
device, and volume (LUN), accountability can be
maintained at the department level.
- Session layer response time metrics allow you to
distinguish between network, server, and storage
device latency.
37Performance Monitoring Solutions to Current
Challenges
- Managing heterogeneous environments.
- Decreasing or eliminating downtime with proactive
policy-based monitoring.
- Because solution is external to networking
devices and uses standard collection interfaces,
it is independent of fabric vendor, ULP, and can
extend to the WAN.
- Real-time and SNMP alerts on user-defined
thresholds. You profile the network and define
behavior. Solution provides real-time
notification of policy violation. - Combines the best of both worlds
- Level of visibility on par with expensive
diagnostic tools - Ease of use and capacity planning of an
Enterprise service level management application.
38Advanced Performance Monitoring Solutions
- Capacity Planning/Modeling Planning for network
usage of resources yet added. For example, when
adding a new department with 10 clients to access
application X on Server A. Server A already has
100 clients. Throughput from Server A to what
disks will increase 10? ROI Potential If you
under-use ISLs you are over-spending. - Service Duplication/Modeling Planning for WAN
usage of application yet added. For example, WAN
will support disk mirror. How much bandwidth is
needed to adequately support write I/O to
particular disks or volumes? ROI Potential If
you under-use WAN links you are over-spending. - Performance Tuning An Application/Server
consolidation example Applications needing
access to much of the same data are candidates to
run on the same server or in the same cluster.
ROI Potential If you under-use servers you are
over-spending.
39Advanced Performance Monitoring Solutions
- Performance Tuning Save cost by separating the
types of transactions on the network. For
instance, separating transaction (I/O) and data
intensive operations will allow more transactions
() and deeper data mining. - Add value to storage management applications
Example performance monitoring application feeds
data backup/replication application so that
backup time period is automatically selected and
optimized. - Performance Management Automate actions based on
conditions detected. Example Feedback loop to
switching devices for intelligent routing
decisions. - Life-Cycle Data Monitoring Based on level of
access over network, determine appropriate
storage type for particular data or application.
Provides feedback for HSM.
40QuestionsorFor a Copy of the PresentationDavi
d.Signori_at_Inrange.com703-442-3284