UCSDUCOP Disaster Recovery Project - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

UCSDUCOP Disaster Recovery Project

Description:

9/5/09. 1. UCSD/UCOP Disaster Recovery Project ... UCOP unix/linux using SSH RSYNCH server to server daily ... Establish documentation policy (.i.e. format, ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 14
Provided by: PWe37
Category:

less

Transcript and Presenter's Notes

Title: UCSDUCOP Disaster Recovery Project


1
UCSD/UCOP Disaster Recovery Project
  • Review of objectives, rationale and
    accomplishments
  • UCOP/UCSD team

2
Situation as of 2Q2006
  • UCSD had virtually no DR plan in place
  • UCOP used IBM contract in Colorado
  • Cost 200k / yr 600k/month if ever used
  • Had insufficient gear and network reserved,
    cautiously estimate would be gt 50 more cost if
    updated appropriately
  • 40 hrs of testing / year limit, difficult to
    schedule
  • RPO (Recovery Point Objective) lt 7 days
  • RTO (Recovery Time Objective) lt 3 days
  • Required UCOP personnel to activate and operate
  • Past testing indicated decent mainframe recovery
    plan in place, limited distributed system
    capability

3
DR Concept
  • A window of opportunity was seized to implement
    real time DR capability between UCSD UCOP due
    to
  • Having trusted partner high capacity WAN
  • UCOP required much shorter RPO and RTO and must
    expand scope of our DR capability (AYSO, ERS,
    EIAS, etc)
  • Funding availability the opportunity!
  • UCOP had budget by redirecting DR contract and
    leveraging storage purchase
  • UCSD leveraged storage purchase server
    replacement timing
  • UCOP had experience from Colorado contract
  • Keys to approach
  • Buy enough storage in order to synchronize data
    in real or near real time between sites and avoid
    any need to load data during an event
  • Leverage CBU mainframe option and add mainframe
    memory.
  • For all other servers buy sufficient gear to
    have capacity available to run mission critical
    services at either location without having to
    repurpose servers during an event

4
Advantages of this DR approach
  • Costs for UCOP are comparable to old DR plan
  • Capability is dramatically improved
  • RTO and RPO lt 1 day (actually far less)
  • Can test as often as needed (and we need it!)
  • Equipment is there and operational
  • No incremental cost during first 90 days of a
    disaster
  • More services can be easily added after the
    initial investment (labor and infrastructure) and
    easy to optimize over time
  • UC personnel on other side will assist in case
    of disaster, long term goal is to recover without
    any personnel from down location immediately
    available

5
The UCOP DR Portfolio
  • Examined past DR portfolio
  • IRC inventoried and classified existing
    applications
  • Developed phased implementation plan
  • Current DR Apps
  • All Mainframe services (including 9 PPS instances
    UCRS)
  • AYSO and all Benefits services
  • Endowment and Investment Accounting System
  • Infrastructure including TSM, Active Directory,
    VPN
  • Email
  • UCOP Web Servers
  • Banking/Treasury Systems
  • Loan Programs
  • Risk Services
  • Irvine Secondary DNS and Web Server
  • File Sharing

6
Anticipated Additions to UCOP DR Portfolio
  • New DR projects - committed
  • UC Effort Reporting System (3Q2008)
  • UCOP Office of Technology Transfer Informix DB
  • SD Coastal Data Information Program
  • UCSB - UCOP PPRC
  • UCOP iDP Shibboleth Server
  • UCOP TSM Server
  • UC Pathways (3Q2009)
  • New DR projects under consideration
  • UCSD Med PPRC
  • UCSB Distributed DNS Server
  • UCOP California Institute for Energy and
    Environment
  • UCLA Med PPRC

7
Apps and Operational Testing Status
  • Completed
  • Application Accessibility
  • Data Validation
  • Check printing at UCSB
  • Moved Check Stock
  • Operation Procedure
  • Email
  • File Sharing
  • Risk Services Apps
  • Outstanding
  • Tape Drive for remote achieving
  • LPR / VPS remote printing
  • SD Enterprise Extender
  • Secure remote printing
  • Firewall addressing
  • Mainframe outgoing mail via SMTP
  • SSL Cert
  • UCSF FTP

8
Process, Procedure and Documentation
  • Weekly Con Call w/ SD
  • Discuss problems and changes
  • Discuss upcoming technology changes
  • Coordinate scheduled outages and testing
  • Shared Folder
  • IPL / Shutdown Procedures
  • Remote Hands Tasks and Authorization List
  • DR Declaration Procedure
  • Bank Transmission Procedure
  • Website http//www.ucop.edu/sysdev1/dr/drhome.htm
    l
  • Online access (network, server, cabinets diagrams
    etc..)

9
Technical Decisions
  • Storage
  • purchased SAN (2107) from IBM for majority of
    solution used global mirroring
  • UCOP mainframe to UCSD in real time
  • All UCSD to UCOP in real time
  • UCOP unix/linux using SSH RSYNCH server to
    server daily
  • UCOP windows using server to server synch in
    real time
  • All data encrypted except windows at this time.
    Windows will be encrypted soon
  • Bandwidth Considerations
  • During initial synch, 100 GB / hr (approx
    300Mb/sec)
  • during normal ongoing synch, anywhere from
    0-100Mb / sec
  • during test, will get out of synch, after test
    is complete and during catch up, about 300Mb/sec
    (This can be refined over time)
  • Simplicity where possible to speed deployment
  • Decided for clustered production environment,
    only fail over to single server DR environment.
    Have sufficient capacity to deliver full service,
    just no redundancy
  • will not initially run production from both
    locations, DR site is just for failover
  • where possible, have duplicate equipment to
    avoid finger pointing and need to worry about
    incompatibility
  • Used our current technical staff, no consultants

10
(No Transcript)
11
Lessons Learned
  • Infrastructure / Operational
  • Physical security access personnel list
  • Coordination of scheduled PPRC impacting changes
  • Consistent method of accessing supported systems
    (i.e. MVS consoles)
  • Floor space availability for growth
  • Establish and coordinate HW/SW upgrades/purchases
    to alleviate compatibility issues and promote
    operational simplicity
  • Address current projects issues before adding new
    services to avoid delays in completing existing
    projects
  • Staffing to support additional services
  • Establish documentation policy (.i.e. format,
    depository and update cycles)
  • Network
  • Coordinate SAN zoning and VSAN numbering for SAN
    switches (allows shared management).
  • Coordinate IP addressing (not really a problem
    within UC campuses, but allows shared
    management).
  • Phased implementation has different needs at
    different phases (i.e. All-or-nothing failover
    needs operations support at remote end for DNS,
    etc.)
  • Document the failover method early and get
    everyones buy-in (assuming that IP addresses can
    easily be moved between sites may cause trouble
    later in the project).

12
Lessons Learned (cont)
  • Network (continued)
  • Global Copy can adversely affect seemingly
    unrelated applications.
  • Establish process and procedure in synchronizing
    Firewall address space with the DR site
  • Understand network realities Latency vs
    bandwidth
  • Latency vs bandwidth
  • Ongoing load vs initial synch
  • Systems
  • Z9 CBU activation will "Perform a Model
    Conversion" at the host site, which will require
    the host site to obtain temporary licenses for
    certain software products. In addition, the host
    site will need to monitor their CPU to make sure
    it doesn't go over their original MSU to avoid
    additional charges.
  • Establish process in exporting SSL certificates
  • Sybase database loading using internal disk
    created loading delays while using its onboard
    disk. We're in the process moving its disk to our
    SNA disk environment to address the issue

13
Critical Success Factors
  • UCOP assigned dedicated staff to drive effort
  • One Team UCOP and UCSD
  • Fight scope creep and go for simplicity
  • Clear mandate, objectives, and timeline
  • Communicate, communicate, communicate
  • Test, Test, Test
  • Engage procurement personnel
Write a Comment
User Comments (0)
About PowerShow.com