Alex Goral, LightSand - PowerPoint PPT Presentation

1 / 78
About This Presentation
Title:

Alex Goral, LightSand

Description:

2005 Hewlett-Packard Development Company, L.P. ... OpenVMS does move mastership of a resource tree to the node with the most activity ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 79
Provided by: keithp7
Category:

less

Transcript and Presenter's Notes

Title: Alex Goral, LightSand


1
Long-Distance HP OpenVMS Clusters
  • Alex Goral, LightSand
  • Dennis Majikas, Digital Networks
  • Keith Parris, HP
  • Session 1530

2
Trends and Driving Forces
  • BC, DR and DT in a post-9/11 world
  • Recognition of greater risk to datacenters
  • Particularly in major metropolitan areas
  • Push toward greater distances between redundant
    datacenters
  • It is no longer inconceivable that, for example,
    terrorists might obtain a nuclear device and
    destroy the entire NYC metropolitan area

3
Trends and Driving Forces
  • "Draft Interagency White Paper on Sound Practices
    to Strengthen the Resilience of the U.S.
    Financial System
  • http//www.sec.gov/news/studies/34-47638.htm
  • Agencies involved
  • Federal Reserve System
  • Department of the Treasury
  • Securities Exchange Commission (SEC)
  • Applies to
  • Financial institutions critical to the US economy

4
Draft Interagency White Paper
  • Maintain sufficient geographically dispersed
    resources to meet recovery and resumption
    objectives.
  • Long-standing principles of business continuity
    planning suggest that back-up arrangements should
    be as far away from the primary site as necessary
    to avoid being subject to the same set of risks
    as the primary location.

5
Draft Interagency White Paper
  • Organizations should establish back-up
    facilities a significant distance away from their
    primary sites.
  • The agencies expect that, as technology and
    business processes continue to improve and
    become increasingly cost effective, firms will
    take advantage of these developments to increase
    the geographic diversification of their back-up
    sites.

6
Basic underlying challenges, and technologies to
address them
  • Data protection through data replication
  • Geographic separation for the sake of relative
    safety
  • Careful site selection
  • Application coordination
  • Long-distance multi-site clustering
  • Inter-site link technologies choices
  • Inter-site link bandwidth
  • Inter-site latency due to the speed of light

7
Dennis Majikas
  • Site Selection
  • Inter-Site Links

8
Multi-Site Clusters
  • Consist of multiple sites in different locations,
    with one or more OpenVMS systems at each site
  • Systems at each site are all part of the same
    OpenVMS cluster, and share resources
  • Sites generally need to be connected by bridges
    (or bridge-routers) pure IP routers dont pass
    the SCS protocol used within OpenVMS Clusters
  • If only IP is available, L2TP Tunnel or LightSand
    boxes might be used

9
Inter-site Link Options
  • Sites linked by
  • DS-3/T3 (E3 in Europe) or ATM circuits from a
    telecommunications vendor
  • Microwave link DS-3/T3 or Ethernet
  • Free-Space Optics link (short distance, low cost)
  • Dark fiber where available. ATM over SONET,
    or
  • Ethernet over fiber (10 mb, Fast, Gigabit)
  • FDDI
  • Fibre Channel
  • Fiber links between Memory Channel switches (up
    to 3 km)

10
Inter-site Link Options
  • Sites linked by
  • Wave Division Multiplexing (WDM), in either
    Coarse (CWDM) or Dense (DWDM) Wave Division
    Multiplexing flavors
  • Can carry any of the types of traffic that can
    run over a single fiber
  • Individual WDM channel(s) from a vendor, rather
    than entire dark fibers

11
Bandwidth of Inter-Site Link(s)
12
Inter-Site Link Requirements
  • Inter-site SCS link minimum standards are in
    OpenVMS Cluster Software SPD
  • 10 megabits minimum data rate
  • Minimize packet latency
  • Low SCS packet retransmit rate
  • Less than 0.1 retransmitted. Implies
  • Low packet-loss rate for bridges
  • Low bit-error rate for links

13
Important Inter-Site Link Decisions
  • Service type choices
  • Telco-provided circuit service, own link (e.g.
    microwave or FSO), or dark fiber?
  • Dedicated bandwidth, or shared pipe?
  • Single or multiple (redundant) links? If
    multiple links, then
  • Multiple vendors?
  • Diverse paths?

14
Long-Distance Clusters
  • OpenVMS officially supports distance of up to 500
    miles (805 km) between nodes
  • Why the limit?
  • Inter-site latency

15
Long-distance Cluster Issues
  • Latency due to speed of light becomes significant
    at higher distances. Rules of thumb
  • About 1 ms per 125 miles, one-way or
  • About 1 ms per 62 miles, round-trip latency
  • Actual circuit path length can be longer than
    highway mileage between sites
  • Latency primarily affects performance of
  • Remote lock operations
  • Remote I/Os

16
Lock Request Latencies
17
Inter-site LatencyActual Customer Measurements
18
Differentiate between latency and bandwidth
  • Cant get around the speed of light and its
    latency effects over long distances
  • Higher-bandwidth link doesnt mean lower latency

19
Latency of Inter-Site Link
  • Latency affects performance of
  • Lock operations that cross the inter-site link
  • Lock requests
  • Directory lookups, deadlock searches
  • Write I/Os to remote shadowset members, either
  • Over SCS link through the OpenVMS MSCP Server on
    a node at the opposite site, or
  • Direct via Fibre Channel (with an inter-site FC
    link)
  • Both MSCP and the SCSI-3 protocol used over FC
    take a minimum of two round trips for writes

20
SAN Extension
  • Fibre Channel distance over fiber is limited to
    about 100 kilometers
  • Shortage of buffer-to-buffer credits adversely
    affects Fibre Channel performance above about 50
    kilometers
  • Various vendors provide SAN Extension boxes to
    connect Fibre Channel SANs over an inter-site
    link
  • See SAN Design Reference Guide Vol. 4 SAN
    extension and bridging
  • http//h20000.www2.hp.com/bc/docs/support/SupportM
    anual/c00310437/c00310437.pdf

21
Alex Goral
  • SAN Extension vs. Application Extension

22
Long-distance OpenVMS Cluster Testing within HP
  • Host-based Volume Shadowing over SAN Extension in
    Colorado Springs
  • Craig Showers lab and Melanie Hubbards test
    work at Nashua

23
Long-distance HBVS Testing
  • SAN Extension used to extend SAN using FCIP
  • No OpenVMS Cluster involved across the distance
    (no OpenVMS node at the remote end just data
    vaulting to a distant disk controller)
  • Distance simulated via introduced packet latency

24
Long-distance HBVS Testing
25
Long-distance OpenVMS Cluster Testing within HP
  • Craig Showers lab and Melanie Hubbards test
    work at Nashua

26
Solutions Engineering OEM Lab Project Oracle 9i
RAC DT/HA in a distributed OpenVMS
EnvironmentPhase II-Shadow Sets, HBMM and
Oracle RAC Proof of Concept
  • Craig Showers Carlton Davis
  • August 22, 2005

27
Background
  • OpenVMS Ambassadors pushed for Proof Of Concept
    (POC) combining Oracle RAC with OpenVMS
    long-distance cluster capabilities
  • Phase I POC LAN Failover failSafeIP with
    Oracle RAC over local and 100km (2004)
  • Ambassadors and Bootcamp attendees provided Phase
    II requirements
  • Separate VMS nodes, RAC instances, clients, and
    disks in a truly distributed 2 node cluster
  • DT environment includes on disk copies use
    Volume Shadowing
  • Consider cost issues for the networking
    infrastructure

28
Project Business Goals
  • Provide proof of concept around RAC over
    stretched VMS cluster
  • Provide data on VMS and Oracle behavior in
    stretched configurations
  • Raise visibility of BCS platform HA and DT
    differentiators working in conjunction with
    Oracle
  • Provide re-usable environment for customers or
    partners to test own application, distance
    requirements
  • Model POC for other operating systems and db
    products
  • Prepare for data security requirements
  • Deliverable technical sales collateral

29
Project Technical Objective
  • Observe and record the behavior of Oracle 9iRAC
    Server used with OpenVMS Shadow Sets on a 2 node
    clustered OpenVMS system with various
    combinations of FULL, COPY and MERGE Shadow Set
    status

30
Partner Involvement
  • LightSand
  • Engineer, Network Switch Alex Goral
  • Digital Networks
  • Networking Dennis Majikas

31
Configuration
  • 2 node GS1280 cluster, 8-cpu, 8GB memory
  • EVA3000, and EVA8000 - RAID1 volumes
  • OpenVMS 7.3-2, latest patches esp HBMM required
  • Oracle 9.2.0.5
  • Swingbench load generator
  • TCPIP 5.4, ECO5
  • LightSand switches allows SCS traffic over IP

32
(No Transcript)
33
Testing
  • Introduce delays in data transmission to emulate
    long distances between nodes campus/metro
    50-100km, regional 500km, and extreme 1000km
  • Load Generation 300 remote clients, 600k
    transactions of typical database functions
  • Observe behavior of RAC and Shadow Set operations
    in combinations of local remotely served
    volumes
  • Data Collection
  • T4 (plus OTLT and VEVAMON collectors)
  • disk related DCL commands
  • transaction output from Load Generation

34
Testing (cont.)
  • Test variations
  • RAC
  • Active-Active
  • Active-Passive
  • Distances between nodes, incl. 0 (data center)
  • shadow sets
  • Network transfer compressed/non-compressed
  • Network bandwidth OC3, gigabit, OC12?

35
Results to Date
  • Datapoints have been collected for Act-Act RAC
    configurations run with the following delays
  • 0ms Local Cluster
  • 1ms 200km (124mi)
  • 3ms 600km (372mi)
  • 5ms 1000km (621mi)
  • 10ms 2000km (1242mi)

36
Mitigating the Impact of Distance
  • Do local lock requests rather than remote
  • Avoid lock directory lookups between sites
  • Avoid SCS and MSCP credit waits
  • Avoid remote shadowset reads (/SITE and
    /READ_COST)
  • Minimize round trips between sites

37
Mitigating Impact of Inter-Site Latency
  • Locking
  • Try to avoid lock requests to master node at
    remote site
  • OpenVMS does move mastership of a resource tree
    to the node with the most activity
  • How applications are distributed across the
    cluster can affect local vs. remote locking
  • But this represents a trade-off among
    performance, availability, and resource
    utilization

38
Application Scheme 1Hot Primary/Cold Standby
  • All applications normally run at the primary site
  • Second site is idle, except for volume shadowing,
    until primary site fails, then it takes over
    processing
  • Performance will be good (all-local locking)
  • Fail-over time will be poor, and risk high
    (standby systems not active and thus not being
    tested)
  • Wastes computing capacity at the remote site

39
Application Scheme 2Hot/Hot but Alternate
Workloads
  • All applications normally run at one site or the
    other, but not both data is shadowed between
    sites, and the opposite site takes over upon a
    failure
  • Performance will be good (all-local locking)
  • Fail-over time will be poor, and risk moderate
    (standby systems in use, but specific
    applications not active and thus not being tested
    from that site)
  • Second sites computing capacity is actively used

40
Application Scheme 3Uniform Workload Across
Sites
  • All applications normally run at both sites
    simultaneously surviving site takes all load
    upon failure
  • Performance may be impacted (some remote locking)
    if inter-site distance is large
  • Fail-over time will be excellent, and risk low
    (standby systems are already in use running the
    same applications, thus constantly being tested)
  • Both sites computing capacity is actively used

41
Mitigating Impact of Inter-Site Latency
  • Lock directory lookups
  • Lock directory lookups with directory node at
    remote site can only be avoided by setting
    LOCKDIRWT to zero on all nodes at the remote site
  • This is typically only satisfactory for
    Primary/Backup or remote-shadowing-only clusters
  • For cases where applications create new locks and
    free them instead of converting to/from Null
    mode
  • Create a program to take out a Null lock on the
    root resources and simply hold those locks
    forever

42
Mitigating Impact of Inter-Site Latency
  • SCS Credit Waits
  • Check SHOW CLUSTER/CONTINUOUS with ADD
    CONNECTIONS, ADD REM_PROC_NAME and ADD CR_WAITS
    to check for SCS credit waits. If counts are
    present and increasing over time, increase the
    SCS credits at the remote end as follows

43
SCS Flow Control
  • How to alleviate SCS credit waits
  • For wait on VMSVAXcluster SYSAP on another
    OpenVMS node
  • Raise SYSGEN parameter CLUSTER_CREDITS
  • Default is 10 maximum is 128
  • For wait on VMSDISK_CL_DRVR SYSAP to MSCPDISK
    SYSAP on another OpenVMS node
  • Raise SYSGEN parameter MSCP_CREDITS on serving
    node
  • Default is 8 maximum is 128

44
Minimizing Round Trips Between Sites
  • MSCP-served reads take 1 round trip writes take
    two
  • Fibre Channel SCSI-3 Protocol tricks to do writes
    in 1 round trip
  • e.g. Ciscos Write Acceleration

45
Lab Cluster Configuration
46
LAN Switch
LAN Switch
GS1280
SCS
SCS
XP1000
GS1280
Cisco
Cisco
XP1000
IP
IP
GS1280
XP1000
GS1280
LAN Switch
LAN Switch
Shunra
SCS
SCS
FC
FC
Shunra
FC Switch
FC Switch
LightSand
LightSand
Delay Box
MSA1000 Storage
HSG80 Storage
47
LAN Switch
LAN Switch
GS1280
SCS
SCS
XP1000
GS1280
Cisco
Cisco
XP1000
IP
IP
GS1280
XP1000
GS1280
LAN Switch
LAN Switch
Shunra
SCS
SCS
FC
FC
Shunra
FC Switch
FC Switch
LightSand
LightSand
Delay Box
MSA1000 Storage
HSG80 Storage
One SCS Path
48
LAN Switch
LAN Switch
GS1280
SCS
SCS
XP1000
GS1280
Cisco
Cisco
XP1000
IP
IP
GS1280
XP1000
GS1280
LAN Switch
LAN Switch
Shunra
SCS
SCS
FC
FC
Shunra
FC Switch
FC Switch
LightSand
LightSand
Delay Box
MSA1000 Storage
HSG80 Storage
2nd SCS Path
49
LAN Switch
LAN Switch
GS1280
SCS
SCS
XP1000
GS1280
Cisco
Cisco
XP1000
IP
IP
GS1280
XP1000
GS1280
LAN Switch
LAN Switch
Shunra
SCS
SCS
FC
FC
FC
Shunra
FC Switch
FC Switch
LightSand
LightSand
FC Switch
Delay Box
MSA1000 Storage
HSG80 Storage
Fibre Channel Path
50
Interactive DemoLong-Distance Cluster
Considerations
  • Demonstration of how to measure lock-request
    latency
  • LOCKTIME.COM tool
  • Demonstration of measuring local and remote disk
    I/O latency
  • DISKBLOCK tool

51
Data Replication
  • Providing and maintaining redundant copies of
    data is obviously extremely important in a
    disaster-tolerant cluster
  • Options for data replication between sites
  • Host-Based Volume Shadowing software
  • Continuous Access
  • Database replication
  • Middleware (i.e. Reliable Transaction Router
    software)

52
Data Replication
  • Synchronizing data can consume significant
    inter-site bandwidth and time

53
Host-Based Volume Shadowing
  • Host software keeps multiple disks identical
  • All writes go to all shadowset members
  • Reads can be directed to any one member
  • Different read operations can go to different
    members at once, helping throughput
  • Synchronization (or Re-synchronization after a
    failure) is done with a Copy operation
  • Re-synchronization after a node failure is done
    with a Merge operation

54
Fibre Channel and SCSI in Clusters
  • Fibre Channel and SCSI are Storage-Only
    Interconnects
  • Provide access to storage devices and controllers
  • Cannot carry SCS protocol (e.g. Connection
    Manager and Lock Manager traffic)
  • Need SCS-capable Cluster Interconnect also
  • Memory Channel, Computer Interconnect (CI), DSSI,
    FDDI, Ethernet, or Galaxy Shared Memory Cluster
    Interconnect (SMCI)
  • Fail-over between a direct path and an
    MSCP-served path is first supported in OpenVMS
    version 7.3-1

55
Host-Based Volume Shadowing and StorageWorks
Continuous Access
  • Fibre Channel introduces new capabilities into
    OpenVMS disaster-tolerant clusters

56
Continuous Access
Inter-site SCS Link
Node
Node
Inter-site FC Link
FC Switch
FC Switch
EVA
EVA
Controller-Based Mirrorset
57
Continuous Access
Node
Node
Write
FC Switch
FC Switch
Controller in charge of mirrorset
Write
EVA
EVA
Mirrorset
58
Continuous Access
Node
Node
I/O
FC Switch
FC Switch
Controller in charge of mirrorset
I/O
EVA
EVA
Mirrorset
59
Continuous Access
Node
Node
Nodes must now switch to access data through this
controller
FC Switch
FC Switch
EVA
EVA
Mirrorset
60
Host-Based Volume Shadowing
SCS--capable interconnect
Node
Node
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
61
Host-Based Volume ShadowingReads from Local
Member
Node
Node
Read
Read
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
62
Interactive DemoLong-Distance Cluster
Considerations
  • Demonstration of the effect of Volume Shadowing
    SITE and READ_COST settings on member selection
    for read operations

63
Host-Based Volume ShadowingWrites to All Members
Node
Node
Write
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
64
Host-Based Volume Shadowingwith Inter-Site Fibre
Channel Link
SCS--capable interconnect
Node
Node
Inter-site FC Link
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
65
Direct vs. MSCP-Served Paths
SCS--capable interconnect
Node
Node
Inter-site FC Link
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
66
Direct vs. MSCP-Served Paths
SCS--capable interconnect
Node
Node
Inter-site FC Link
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
67
Cross-site Shadowed System Disk
  • With only an SCS link between sites, it was
    impractical to have a shadowed system disk and
    boot nodes from it at multiple sites
  • With a Fibre Channel inter-site link, it becomes
    possible to do this
  • but it is probably still not a good idea (single
    point of failure for the cluster)

68
Host-Based Volume Shadowingwith Inter-Site Fibre
Channel Link
SCS--capable interconnect
Node
Node
Inter-site FC Link
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
69
New Failure ScenariosSCS link OK but FC link
broken
(Direct-to-MSCP-served path failover provides
protection)
SCS--capable interconnect
Node
Node
Inter-site FC Link
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
70
New Failure ScenariosSCS link broken but FC
link OK
(Quorum scheme provides protection)
SCS--capable interconnect
Node
Node
Inter-site FC Link
FC Switch
FC Switch
EVA
EVA
Host-Based Shadowset
71
Interactive DemoMSCP-Served vs. Direct Fibre
Channel
  • Introduction of inter-site SAN link is
    demonstrated disk access fails over from
    MSCP-served path over LAN link to direct Fibre
    Channel path (over LightSand boxes)

72
Dennis Majikas
  • Case studies

73
Keith Parris
  • Case studies
  • Manhattan Municipal Credit Union
  • Another Credit Union
  • New York Clearing Houses

74
Questions?
75
Speaker Contact Info
  • Keith Parris
  • E-mail Keith.Parris_at_hp.com or keithparris_at_yahoo.c
    om
  • Web http//www2.openvms.org/kparris/

76
(No Transcript)
77
get connected
People. Training. Technology.
78
(No Transcript)
79
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com