Setting the Standard for DR - PowerPoint PPT Presentation

About This Presentation

Title:

Setting the Standard for DR

Description:

CMT. CMT. Crisis Management Team. Business Continuity Management Team. Incident Management Team ... john.pollard_at_unisys.com. Part II - Workshop. John Pollard Unisys ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 47

Provided by: Unis152

Category:

more less

Transcript and Presenter's Notes

Title: Setting the Standard for DR

1
Setting the Standard for DR

John Pollard 23 March 2006

PAS 77 Guide to IT Service Continuity Management
2
PAS 56 Guide to Business Continuity Management
Business Continuity Management
RISK MANAGEMENT
IT DISASTER RECOVERY
FACILITIES MANAGEMENT
SUPPLY CHAIN MANAGEMENT
QUALITY MANAGEMENT
HEALTH SAFETY
KNOWLEDGE MANAGEMENT
EMERGENCY MANAGEMENT
SECURITY
CRISIS COMMUNICATIONS PR
Source PAS 562003 Guide to Business
Continuity Management
3
IT Service Continuity Management
managing an organisations ability to continue
to provide a pre-determined and agreed level of
IT Services to support the minimum business
requirements
Source ITIL Best Practice for Service Delivery
4
Threats

Loss, damage or denial of access to key
infrastructure services
Failure or non-performance of third parties
Loss or corruption of key information
Sabotage, extortion or industrial espionage
Infiltration or attack on critical information
systems

5
Scope

Generic framework and guidelines for a continuity
programme, including
Management structure responsibilities
How to conduct business criticality risk
assessments
How to define and create an IT Service Continuity
plan
How to rehearse an IT Service Continuity plan
Solution architectures and design considerations

6
What is a PAS?
Source BSI
7
Status
Group formed
External review
Expected release
First draft
Edit
Revise
Contracts / Structure / Content
Q4
Q1
Q2
Q4
Q3
Q1
Q2
Q4
Q3
2005
2006
2004
8
Contributors
9
ITSC Strategy

Define direction and high-level methods to meet
IT service level objectives
Agreed at Board level
Needs to consider 4 stages of major incident
Initial response
Service recovery
Service delivery (following incident)
Normal service resumption
Enable rehearsal of major incident

10
ITSC Strategy Plan
Business Strategy
Business Criticality
Threat Analysis
IT Service Continuity Strategy
IT Architecture
IT Service Continuity Plan
Rehearsals
Costs
Processes
11
Maintaining an ITSC Strategy
Monitor
12
Management Structure
Crisis Management Team
CMT
CMT
Business Continuity Management Team
BCMT
BCMT
Incident Management Team
IMT
IMT
13
Business Criticality Risk Assessments

Identify business units processes
Categorise criticality of processes
Identify IT services supporting the business
processes
Categorise criticality of IT services
Review
By location
By business unit

14
Business Criticality Categories

Critical
Vital to day-to-day operation
Mandatory
Vital to meet statutory requirements
Strategic
Important for implementation of long-term
strategy
Tactical
Important for short/medium term objectives

15
Risk Assessment Process
Learn Lessons
16
ITSC Plan

Part of wider BCM Plan
Model plan should include
Initial response
Incident assessment
Roles responsibilities
Procedures
Rehearsing the plan
Maintaining the plan

17
Recovery Objectives

Recovery Point Objective (RPO)
The point in time to which work is restored. E.g.
Start of day
Recovery Time Objective (RTO)
The time required to recover service

18
Balancing Cost Recovery Objectives
19
IT Architecture Resilience Considerations

Location distance between sites
Number of sites
Staff access proximity
Remote access
Dark site vs. manned site
Staff skill levels
Telecoms connectivity and redundant routing
Automation required
Telephony and email
3rd party / external links

20
High Level Process Flow
21
Task Summary Sheet
22
Rehearsal

A body to control coordinate
Objectives success criteria
Rehearsal plan scripts
Staff briefing
Logs and critique forms
Observers
Post-rehearsal review

23
Areas to Rehearse

Callout
Walk through reviews
Walk through exercises
Component rehearsals
Integration rehearsals
Relocation rehearsals
Failover rehearsals
Major incident simulations

24
Architectures
25
Site Models

Active / Contingency
Cold site
Active / Active
Service runs from both sites
Active / Alternate
Service can run from either site
Active / Backup
Warm standby site
Multi-site and other hybrids

26
Data Resilience
Tape/backup
Database
Application
Host
Storage Array
SAN
27
Replication Modes

Synchronous
Increased write latency
Typically OK for OLTP
May impact batch processing
Requires greater inter-site bandwidth than other
options
Snapshot
Point in time copy
Only valid on completion of transfer
Minimal/no performance impact
Near real-time
Frequent snapshots
Minimal performance impact

28
A Holistic Approach
Service Continuity is much more than technology
29
john.pollard_at_unisys.com
30
Part II - Workshop
Defining the Standard for DR

John Pollard Unisys

PAS 77 Guide to IT Service Continuity Management
31
Typical Challenges

Tape recovery slow
Manual build is complex
Complex inter-operation between systems
Difficult to define critical and non-critical
Management of failover site
Keeping sites in step
Windows Servers

32
Synchronous Write Latency
Server
Transfer time
Write 1 0. 5 mSec
Write 2 0.5 mSec
Storage Array
Storage Array
Communication link
Latency 2 Write Time Transfer Time
For 200 kilometres using Fibre Channel Latency
2 0.5 4.0 5.0 mSec
33
Site Synchronisation

Major challenge
Cultural change is needed
Critical to successful operation
DR systems
Build at recovery time
Slow / complex recovery
Maintain ready to use
How to validate changes
Live run
System dependent

34
Windows Servers

Build DR servers at recovery time
Lengthy recovery process
Prone to errors
Complex requires higher skill level
Maintain DR servers ready to use
HW does not have to be identical
Complex SW change and configuration management
How to validate releases
Boot servers from storage array
Requires matching HW
SW only installed once
Simplifies SW change and configuration management
Simplifies failover process / improves recovery

35
Windows Boot from SAN
Production Site
DR Site
Test Server
Live Server
DR Server
Test Data
Live Data
Live OS
Test OS
Data
OS
Storage Array
Storage Array
36
Virtualisation

Reduced investment
Fewer servers dedicated for resilience
Expand/replace if long term outage
Flexibility
Allocate/use servers as required
Potentially reduced capacity
Depending on system and scale of incident
Configuration may not have been proved

37
Service Management

Identify Affected Areas
Service Desk
Incident Management
Problem Management
Configuration Management
Change Management
Release Management
Testing

38
Operational Assessment

Understand people and process
Gap analysis

39
Delivery Approach
Discover
Model
Design
Implement
Manage

Business Objectives
Current Issues or Problems
Existing/Target Infrastructure
Success Criteria
Vision
Existing Systems, Applications Services

Physical As-Is Model
Logical As-Is Model
Data profiling
Security assessment

To-Be Logical Model
To-Be Physical Model
Project plan
Resource schedule
Develop business case

Implement target environment
Migrate and consolidate applications
Application and middleware integration
Define and implement test strategy

Operational assessment gap analysis
Implement operational management processes

40
Workshop

Determine high-level requirements
Determine Business Drivers
Determine Success Criteria
Overview systems and applications
Identify team members, sponsors, etc.
Agree timelines

41
Discovery

Audit and map
Hardware
Software
Services

42
Analysis

Data
Applications
Services
Group Systems

43
Design

Systems architecture
Operational assessment
Test environment
Project plan and resource schedule
Training requirements

44
Transition to Future State
Operational Management
Optimised Architecture
Service Continuity
Application Selection and Development Standards
Data Centre Transformation
Network Design
Storage Design
Training Requirements
Systems Design
Systems Management
Migration Plan
Test Environment and Strategy
45
Implementation