Title: State Data Center Disaster Recovery Overview
1State Data CenterDisaster Recovery Overview
Presented to Disaster Recovery Coordinators
Meeting Date 3/19/2009
2SDC Service Continuity View
SDC partially usable
SDC not usable
SDC can be used
Operational Recovery Restoration of normal
service is part of standard rates
Disaster Recovery additional costs apply
Grey area - Expected time to return equipment to
normal service will determine whether DR is
invoked
Impact to service delivery
Outage Impact
Normal Operations
Severity 4
Severity 3
Severity 2
Severity 1
Disaster
Bug or minor issue where application is still
functioning
Major issue with high impact equipment not
usable
3Successful DR Requires Cooperation
- Participating Agency
- Plans DR based on business needs and priorities
- Acquires DR services for out-of-scope IT
- Funds DR planning, backup recovery
- Prioritizes recovery sequence within agency
- Tests agency DR plans
- Determines scope and declares disaster for
out-of- scope IT - Arranges for backups of data and applications
- Keeps vendor informed of changes
- SDC
- Plans internal continuity of service delivery if
infrastructure must relocate - Contracts with DR vendor to provide
infrastructure environment in case of SDC
disaster - Determines scope and declares SDC disaster to DR
vendor - Coordinates Cross-Agency priority sequencing
- Tests SDC DR plans
- Coordinates movement of people, backup
resources, and communications connectivity - Keeps vendor informed of changes
- Disaster Recovery Vendor
- Provides environment for disaster recovery and
testing (hot sites and portable sites) - Hosts DR tests
- May provide consulting and operations support as
contracted
4What BCP Coordinators need to know
- Who is your DR coordinator?
- Has your agency done DR planning?
- What applications are needed to support critical
business functions? - Where are those applications hosted?
- What are the disaster recovery time objective
(RTO) and recovery point objective (RPO) for each
of those applications? - Are the applications and their data backed up
frequently enough to meet RPO? - Is the recovery option and grouping of back ups
for each application reasonable for the RTO? - Will the agencys budget planning support the
cost associated with meeting the desired RTO and
RPO level?
5What DR Coordinators need to know
- Answers to all the questions in What do BCP
Coordinators need to know, plus - Who is your BCP coordinator?
- What agency infrastructure will need to be
recovered before recovered applications and data
will be accessible to users? (e.g., DNS, LDAP,
Active Directory, networks) - What communications vehicles are expected to be
available during a disaster? (e.g., email,
blackberry, IM) - What are the recovery procedures for agency
infrastructure, communications, applications, and
data? - What are DR testing plans?
- What are the procedures for keeping all of this
up to date?
6SDC DR Project Actions
- Develop DR planning framework and templates with
SunGard - Identify, scope and develop backup and recovery
for SDC core infrastructure and infrastructure
needed to support agency recovery requirements - Assist agencies with identifying and scoping DR
requirements for their infrastructure,
applications and data - Develop and implement tiered DR strategies
- Develop DR test plans and execute initial tests
- Develop and implement DR maintenance process
7Working with the SDC on DR Planning
- Submit request for DR planning and preparation
through normal agency procedures - Provide initial information on DR requirements
- Once potential solutions are scoped and priced,
get agency approval to proceed - Provide detailed planning information
- Plan agency testing
8Key data for DR Planning
Needed for getting to more detail
Needed for planning the best recovery strategy
Needed for planning the best backup strategy
- Needed for
- estimating cost
- aggregating need
9Recovery Options
10Recovery Timeline
MAD
RPO
RTO
Work Recovery
Restoration Time
Work backlog, Workaround procedures
Recover lost transactions, Accomplish backlogged
work
Rebuild business continuity systems
Lost transactions
Business process meeting SLAs
Systems recovered
Last backup or data replication
Disaster event
Business continuity protection restored
Source Building a Business Impact Analysis
The Keystone to Effective Business Continuity
Planning by Richard Jones, v1 7/30/2008, Burton
Group
11Definitions for Recovery Timeline
- MAD Maximum Allowable Downtime the maximum
amount of time the business can suffer an
inoperable business process before significant
negative consequences are felt. Also called
Maximum Acceptable Outage (MAO), Maximum
Allowable Outage (MAO), Maximum Acceptable
Downtime (MAD), Maximum Tolerable Downtime (MTD),
Maximum Tolerable Outage (MTO), and Maximum
Tolerable Period of Disruption (MTPD). - RPO Recovery Point Objective the amount of IT
systems data or transaction loss that can be
tolerated by the business process - RTO Recovery Time Objective the time IT
organizations have to recover their systems to an
agreed upon operational state so that workers may
then recover the lost time of the outage to bring
the business process back to acceptable service
levels. - Work Recovery The work time required to recover
the lost transactions of the RPO time plus the
backlog of work created during the system outage.
Lost transactions must be recovered manually and
procedures should be in place to accomplish this
work. - Restoration time Time to bring the business
process back to a state of full business
continuity protection. Basically this is backing
up the recovered system and restoring redundancy
capabilities.