Setting the Standard for DR - PowerPoint PPT Presentation

About This Presentation
Title:

Setting the Standard for DR

Description:

CMT. CMT. Crisis Management Team. Business Continuity Management Team. Incident Management Team ... john.pollard_at_unisys.com. Part II - Workshop. John Pollard Unisys ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 47
Provided by: Unis152
Category:
Tags: setting | standard

less

Transcript and Presenter's Notes

Title: Setting the Standard for DR


1
Setting the Standard for DR
  • John Pollard 23 March 2006

PAS 77 Guide to IT Service Continuity Management
2
PAS 56 Guide to Business Continuity Management
Business Continuity Management
RISK MANAGEMENT
IT DISASTER RECOVERY
FACILITIES MANAGEMENT
SUPPLY CHAIN MANAGEMENT
QUALITY MANAGEMENT
HEALTH SAFETY
KNOWLEDGE MANAGEMENT
EMERGENCY MANAGEMENT
SECURITY
CRISIS COMMUNICATIONS PR
Source PAS 562003 Guide to Business
Continuity Management
3
IT Service Continuity Management
managing an organisations ability to continue
to provide a pre-determined and agreed level of
IT Services to support the minimum business
requirements
Source ITIL Best Practice for Service Delivery
4
Threats
  • Loss, damage or denial of access to key
    infrastructure services
  • Failure or non-performance of third parties
  • Loss or corruption of key information
  • Sabotage, extortion or industrial espionage
  • Infiltration or attack on critical information
    systems

5
Scope
  • Generic framework and guidelines for a continuity
    programme, including
  • Management structure responsibilities
  • How to conduct business criticality risk
    assessments
  • How to define and create an IT Service Continuity
    plan
  • How to rehearse an IT Service Continuity plan
  • Solution architectures and design considerations

6
What is a PAS?
Source BSI
7
Status
Group formed
External review
Expected release
First draft
Edit
Revise
Contracts / Structure / Content
Q4
Q1
Q2
Q4
Q3
Q1
Q2
Q4
Q3
2005
2006
2004
8
Contributors
9
ITSC Strategy
  • Define direction and high-level methods to meet
    IT service level objectives
  • Agreed at Board level
  • Needs to consider 4 stages of major incident
  • Initial response
  • Service recovery
  • Service delivery (following incident)
  • Normal service resumption
  • Enable rehearsal of major incident

10
ITSC Strategy Plan
Business Strategy
Business Criticality
Threat Analysis
IT Service Continuity Strategy
IT Architecture
IT Service Continuity Plan
Rehearsals
Costs
Processes
11
Maintaining an ITSC Strategy
Monitor
12
Management Structure
Crisis Management Team
CMT
CMT
Business Continuity Management Team
BCMT
BCMT
Incident Management Team
IMT
IMT
13
Business Criticality Risk Assessments
  • Identify business units processes
  • Categorise criticality of processes
  • Identify IT services supporting the business
    processes
  • Categorise criticality of IT services
  • Review
  • By location
  • By business unit

14
Business Criticality Categories
  • Critical
  • Vital to day-to-day operation
  • Mandatory
  • Vital to meet statutory requirements
  • Strategic
  • Important for implementation of long-term
    strategy
  • Tactical
  • Important for short/medium term objectives

15
Risk Assessment Process
Learn Lessons
16
ITSC Plan
  • Part of wider BCM Plan
  • Model plan should include
  • Initial response
  • Incident assessment
  • Roles responsibilities
  • Procedures
  • Rehearsing the plan
  • Maintaining the plan

17
Recovery Objectives
  • Recovery Point Objective (RPO)
  • The point in time to which work is restored. E.g.
    Start of day
  • Recovery Time Objective (RTO)
  • The time required to recover service

18
Balancing Cost Recovery Objectives
19
IT Architecture Resilience Considerations
  • Location distance between sites
  • Number of sites
  • Staff access proximity
  • Remote access
  • Dark site vs. manned site
  • Staff skill levels
  • Telecoms connectivity and redundant routing
  • Automation required
  • Telephony and email
  • 3rd party / external links

20
High Level Process Flow
21
Task Summary Sheet
22
Rehearsal
  • A body to control coordinate
  • Objectives success criteria
  • Rehearsal plan scripts
  • Staff briefing
  • Logs and critique forms
  • Observers
  • Post-rehearsal review

23
Areas to Rehearse
  • Callout
  • Walk through reviews
  • Walk through exercises
  • Component rehearsals
  • Integration rehearsals
  • Relocation rehearsals
  • Failover rehearsals
  • Major incident simulations

24
Architectures
25
Site Models
  • Active / Contingency
  • Cold site
  • Active / Active
  • Service runs from both sites
  • Active / Alternate
  • Service can run from either site
  • Active / Backup
  • Warm standby site
  • Multi-site and other hybrids

26
Data Resilience
Tape/backup
Database
Application
Host
Storage Array
SAN
27
Replication Modes
  • Synchronous
  • Increased write latency
  • Typically OK for OLTP
  • May impact batch processing
  • Requires greater inter-site bandwidth than other
    options
  • Snapshot
  • Point in time copy
  • Only valid on completion of transfer
  • Minimal/no performance impact
  • Near real-time
  • Frequent snapshots
  • Minimal performance impact

28
A Holistic Approach
Service Continuity is much more than technology
29
john.pollard_at_unisys.com
30
Part II - Workshop
Defining the Standard for DR
  • John Pollard Unisys

PAS 77 Guide to IT Service Continuity Management
31
Typical Challenges
  • Tape recovery slow
  • Manual build is complex
  • Complex inter-operation between systems
  • Difficult to define critical and non-critical
  • Management of failover site
  • Keeping sites in step
  • Windows Servers

32
Synchronous Write Latency
Server
Transfer time
Write 1 0. 5 mSec
Write 2 0.5 mSec
Storage Array
Storage Array
Communication link
Latency 2 Write Time Transfer Time
For 200 kilometres using Fibre Channel Latency
2 0.5 4.0 5.0 mSec
33
Site Synchronisation
  • Major challenge
  • Cultural change is needed
  • Critical to successful operation
  • DR systems
  • Build at recovery time
  • Slow / complex recovery
  • Maintain ready to use
  • How to validate changes
  • Live run
  • System dependent

34
Windows Servers
  • Build DR servers at recovery time
  • Lengthy recovery process
  • Prone to errors
  • Complex requires higher skill level
  • Maintain DR servers ready to use
  • HW does not have to be identical
  • Complex SW change and configuration management
  • How to validate releases
  • Boot servers from storage array
  • Requires matching HW
  • SW only installed once
  • Simplifies SW change and configuration management
  • Simplifies failover process / improves recovery

35
Windows Boot from SAN
Production Site
DR Site
Test Server
Live Server
DR Server
Test Data
Live Data
Live OS
Test OS
Data
OS
Storage Array
Storage Array
36
Virtualisation
  • Reduced investment
  • Fewer servers dedicated for resilience
  • Expand/replace if long term outage
  • Flexibility
  • Allocate/use servers as required
  • Potentially reduced capacity
  • Depending on system and scale of incident
  • Configuration may not have been proved

37
Service Management
  • Identify Affected Areas
  • Service Desk
  • Incident Management
  • Problem Management
  • Configuration Management
  • Change Management
  • Release Management
  • Testing

38
Operational Assessment
  • Understand people and process
  • Gap analysis

39
Delivery Approach
Discover
Model
Design
Implement
Manage
  • Business Objectives
  • Current Issues or Problems
  • Existing/Target Infrastructure
  • Success Criteria
  • Vision
  • Existing Systems, Applications Services
  • Physical As-Is Model
  • Logical As-Is Model
  • Data profiling
  • Security assessment
  • To-Be Logical Model
  • To-Be Physical Model
  • Project plan
  • Resource schedule
  • Develop business case
  • Implement target environment
  • Migrate and consolidate applications
  • Application and middleware integration
  • Define and implement test strategy
  • Operational assessment gap analysis
  • Implement operational management processes

40
Workshop
  • Determine high-level requirements
  • Determine Business Drivers
  • Determine Success Criteria
  • Overview systems and applications
  • Identify team members, sponsors, etc.
  • Agree timelines

41
Discovery
  • Audit and map
  • Hardware
  • Software
  • Services

42
Analysis
  • Data
  • Applications
  • Services
  • Group Systems

43
Design
  • Systems architecture
  • Operational assessment
  • Test environment
  • Project plan and resource schedule
  • Training requirements

44
Transition to Future State
Operational Management
Optimised Architecture
Service Continuity
Application Selection and Development Standards
Data Centre Transformation
Network Design
Storage Design
Training Requirements
Systems Design
Systems Management
Migration Plan
Test Environment and Strategy
45
Implementation
  • Methodology
  • Call on best practice
  • Operational management
  • Cultural change
  • Keep people informed

46
john.pollard_at_unisys.com
Write a Comment
User Comments (0)
About PowerShow.com