Title: Chapter 9 Business Continuity Planning and Disaster Recovery
1Chapter 9Business Continuity Planning and
Disaster Recovery
2BCP and DR (770)
- An organization is dependant on resources,
personnel and tasks performed on a daily bases to
be healthy and profitable. Loss or disruption of
these resources can be detrimental. Causing great
damage or even complete destruction of the
business. - Business MUST have a plan to deal with unforeseen
events.
3BCP and DR (770)
- Business Continuity Planning is a broad approach
to ensure that a business can function in the
event of disruption of normal data processing
operations. - Disaster Recovery Planning is a subset of BCP.
The goal of a DRP is to minimize the effects of a
disaster and take necessary steps to ensure that
the resources, personnel and business processes
are able to resume operation in a timely manner.
4Terms for This Chapter
- Business Continuity Plan a document describing
how an organization responds to an event to
ensure critical business functions continue
without unacceptable delay or change. - Business Continuity Planning Planning to help
organizations identify the impacts of potential
data processing and operation disruptions and
data loss, formulate recovery plans to ensure the
availability of data processing and operational
resources. - (more)
5Terms
- Business Impact Analysis Process of analyzing
all business functions within the organization to
determine the impact of a data processing outage. - Business Resumption Planning BRP develops
procedures to initiate the recovery of business
operations immediately following and outage or
disaster. - (more)
6Terms (pg 665 ISC book)
- Contingency Plan a document providing the
procedures for recovering a major application or
information system network in the event of an
outage or disaster. - Continuity of Operations Plan A document
describing the procedures and capabilities to
sustain an organizations essential strategic
functions at an alternate site for up to 30 days. - (more)
7Terms
- Crisis Communications Plan A document that
outlines the procedures for disseminating status
reports to personnel and the public in the event
of an outage or disaster. - Critical System The hardware and software
necessary to ensure the viability of a business
unit or organization during an interruption in
normal data processing support. - (more)
8Terms
- Critical Business Functions The business
functions and processes that MUST be restored
immediately to ensure the organizations assets
are protected, goals met and that the
organization is in compliance with any
regulations and legal responsibilities. - (more)
9Terms
- Cyber Incident Response Plan strategies to
detect, respond and limit the consequences of
cyber incidents. - Disaster Recovery Plan A plan that provides
detailed procedures to facilitate recovery of
capabilities at an alternate site. - Disaster Recovery Planning The process to
develop and maintain a disaster Recovery Plan - (more)
10Objectives of the BCP (771)
- The objectives of BCP are the following
- Provide an immediate response to emergency
situations - Protect lives and ensure safety
- Reduce business impact
- Resume critical business functions
- Reduce confusion during a crisis
- Ensure survivability of the business
- Get up and running ASAP after a disaster
11Business Continuity Planning
12BCP Overview (771)
- The goal of a BCP is ultimately to help a company
resume operating of business functions as soon as
possible after a damaging event. If you think
about it, a BCP is really part of the larger
security program. As such a BCP should be part
of the security policy
13Steps in BCP (overview) (772)
- ISC states 5 Phases in BCP. We will outline them
now, and detail them later. - Project Initialization establish a project team
and obtain management support - Conduct BIA identify time-critical business
processed and determine maximum outages - Identify Preventative controls
- Recovery Strategy identify and select the
appropriate recovery alternatives to meet the
recovery time requirements. - (more)
- .
14Creating the BCP (overview) (772)
- 5. Develop the contingency plan document the
results of the BIA findings and recovery
strategies in a written plan - Testing, Awareness, and Training establish the
processes for testing the recovery strategies,
maintaining the BCP, and ensuring that those
involved are aware and trained in the recovery
strategies. - Maintenance Maintain the plan
15BCP Phase 1 (776)
- Project Management and Initialization
- In this step
- we must solidify managements support, because
without management support, NOTHING will be
successful. - Develop a Continuity Planning Policy Statement
lays out the scope of the BCP project, roles
and members, and goals. - (more)
16BCP Phase 1 (776)
- We then must identify a Business Continuity
Coordinator (the BCP team leader) - Establish a BCP team
- What types of people/roles should be on the team
Can anyone think of certain positions that should
make up the team? (pg 776) - Which people will be chosen for the team
- (more)
17BCP Phase 2 (BIA) (778)
- Phase 2 of the BCP steps is to conduct a Business
Impact Analysis. In short this step is to outline
what procedures and resources the company depends
on, how important each processes is and how long
the business can do without each resource. The
formalized step are conversed next.
18Phase 2 BIA (overview) (778)
- Select individuals to interview to determine what
processes we have to protect - Create data gathering techniques to gather data
about these processes - Identify the companies critical business
functions/processes - Identify the resources these processes depend on
- (more)
19Phase 2 BIA (overview) (778)
- 5. Calculate how long these functions can survive
without these resources - 6. Identify vulnerabilities and threats to these
processes - 7. Calculate the risk for each business process
- 8. Document findings and report them to management
20BCP Phase 2 Step 1 (779) Determine Information
Gathering Techniques
- In this step the BCP committee needs to identify
the types of people that will be part of the BIA
gathering sessions. - These people should represent the different
departments that make up the business. - After determining the general roles, we need to
actually find the actual employees that fill
these roles, so we can interview them.
21BCP Phase 2 Step 2 Select Interviewees
- In this phase the BCP team must create data
gathering techniques to use when interviewing and
gathering other information to support the BCP
objectives. (surveys, questionnaires etc)
22BCP Phase 2 Step 3 Identify Critical Business
Functions
- Based on the information gathered by the
interviews and the data gathering techniques, we
need to now identify which business processes and
functions are critical for the successful
operation of the business.
23BCP Phase 2 Step 4 Analyze information
- One we know what the important processes are we
need to determine what are the resources that
these processes depend upon. These resources can
be all kinds of things such as servers, data,
people, buildings etc! (not just IT related
things) - Determine cost whether qualitative or
quantitative
24BCP Phase 2 Step 5 Determine MTD and
prioritization (781)
- Now we need to prioritize and calculate the
maximum time we can survive without the business
processes identified in Step 3. This maximum time
is called the Maximum Tolerable Downtime (MTD)
here are some common MTD classifications. - Keep in mind when prioritizing things, we have to
use quantitative and qualitative analysis to
determine just what is critical. For example loss
of some process might not cause immediate
financial loss, but could damage reputation or
competitive advantage, and that damage could be
devastating. - (more)
25BCP Phase 2 Step 5 (782)
- Here are some common MTD classifications that you
should memorize - Crititical 1 4 hours
- Urgent 24 hours
- Important 72 hours
- Normal 7 days
- Nonessential 30 days
26BCP Phase 2 Step 6 - Threats
- Now we need to identify vulnerabilities and
threats to these processes and the resources that
are required for them. (remember Risk
Management/Risk Analysis! ? - On the next slide we will examine some example
threats.
27BCP Phase 2 Step 6
- Some examples are
- Equipment malfunction
- Hacking
- Failure in utilities (power, WAN connections)
- Critical personal becoming unavailable
- Vendors going out of business
- Data Corruption
- Physical Damage (hurricane, earthquake)
28BCP Phase 2 Step 7
- Determine the probability/risk for each business
function.
29BCP Phase 2 Step 8
- Once we have done this research, we must document
and provide our findings to management. Note at
this point we really have not started creating a
Business Continuity Plan yet, Weve just done the
research. Once Management reviews findings and
gives the OK to proceed, we will actually develop
the plan
30BCP Stage 3 Identify Preventative Controls (786)
- Pretty Straightforward, though a lot of work. Now
that we know what we need to protect and the
threats involved. Look at ways to PREVENT these
problems from occurring, so we never have to
worry about dealing with them. This is really
just doing a Risk Analysis and determining Cost
Effective Countermeasures.
31BCP Phase 4 Recovery Strategies (788)
- Ok now we are at the stage where we actually are
developing a PLAN for business continuity. Before
was just initial research and getting management
to give us the OK to develop a plan. - (more)
32BCP Phase 4 Recovery Strategies (787)
- A more technical and tangible stage. The idea
is to figure out what the company ACTUALLY needs
to do to be able to recovery the necessary
business processes in the event of a catastrophe. - Determine the most cost-effective recovery
mechanisms - Formally define the activities and actions that
will be implemented and carried out in response
to a disaster. - These Strategies will be based on the 5 main
business considerations listed on the next page
33Phase 4 Recovery Strategies (787)
- 5 categories
- Business Process Recovery
- Facility Recovery
- Supply and Technology Recovery
- User Environment Recovery
- Data Recovery
- We will go into more detail on each of these
categories coming up.
34Business Process Recovery (788)
- A Business Process is a set of interrelated steps
linked through specific actives to accomplish a
specific task. For these processes the team must
know the components of the process including - Required roles
- Required resources
- Input and output mechanisms
- Workflow steps
- Required time for completions
- How this process interacts with other processes
35Facility Recovery (788)
- Facility Recovery is concerned with the ability
to move processing operations to an alternate
facility in case of the failure of the main
facility. We can have multiple method to deal
with this including - subscriptions services with service bureaus
- Reciprocal Agreements
- Redundant Sites
- Lets looks into each of these more
36Facility Recovery (791) Subscription services
- A subscription service is a contract with a 3rd
party to provide access to a facility. There is
generally a monthly fee to retain the right to
use the facility along with a large Activation
fee and hourly fee when actually using the
facility. This is obviously a short term only
solution. There are 3 types of subscription
services which we will talk about more of in the
next slides - Hot Site
- Warm Site
- Cold Site
37Hot Site (790)
- Hot Site a facility that is fully configured
and ready to operate in a few hours. The only
resources missing from a hot site is the actual
data and the actual employees. - Hardware and software MUST be fully compatible
or its pointless - Very Expensive
- Vendor may not have customer specific or
proprietary hardware/software - can allow for annual testing
- ready within hours
38Warm Site (790)
- A facility that is usually partially configured
with some computing equipment, but not the actual
hard core hardware. I.e. a hot site without the
expensive stuff. - Generally can be up in an acceptable time period.
- May be better for customers with specific
hardware/software needs, customer will bring
computing hardware with them. - Most widely used model
- cheaper
- available for longer timeframe due to reduced
costs - good if you have our own custom
hardware/software - - takes longer to prepare
- -actual yearly testing not generally possible
39Cold Site (790)
- Supplies basic environment, (AC, electrical,
plumbing etc), but NO actual computing equipment.
Can take a while to activate. - cheaper
- available for longer timeframe due to reduced
costs - good if you have our own custom
hardware/software - - May take weeks to get activated and ready
- Cannot do yearly tests
40Reciprocal Agreement (793)
- RA also called Mutual Aid is when two companies
agree to help each other out in the case of an
emergency. Ultimately this is not really
practical for most business. - Can you guys tell me what the Pros and Cons of
this are? Can you tell me why this is not really
practical.
41Redundant Sites (794)
- Pretty much these are HOT sites, that are OWNED
by a company (rather than a service bureau). This
also may have live or slightly delayed data
backups and some staff. - - VERY EXPENSIVE (duplicate costs except for
personnel) - best solution if turn around time and ability
to recover all processing aspects are required
42Multiple Processing Centers (794)
- Another approach is rather to than have only one
center that facilitates a certain business
function. Split the work among multiple active
centers such that there is no single point of
failure. - Solid approach
- Good Scalability for normal business growth
- Just make sure that the other centers have more
resources then they individually need in case
they need to take on more work, due to the
failure of another center.
43Supply and Technology Recovery (795)
- Ok so we have plans to recover our facilities and
our main processing requirements. But what about
the lower level of things - Hardware Backups
- Software Backups
- Documentation
- Human Resources
- These considerations need to be taken into
consideration too we will briefly talk about
these in the next few slides
44Hardware backups (796)
- Ok so we have a space to process, but unless we
have a hot site or redundant site, and our
building is destroyed where do we get the
servers from, what about the desktops that our
staff need? Do we have a vendors to provide
these, how long will it take to get new equipment
from them? What happens of we have legacy
equipment what do we do? - We need to take all of these questions into
consideration when planning.
45Software Backups (797)
- Like the hardware backups, but specifically about
hardware. How do we get copies of the software,
how to we roll out installs. What about
licensing? - What about custom software that we had created
that we cannot just go out and buy at the store? - Software escrow what is this? Anyone?
46Documentation (798)
- OK so we have the equipment and software how do
we get it all rolled out and configured such that
it was the same at the company. - Incorrect configurations COULD cause compromises
in integrity or confidentiality! (how?) - Do we even how our old network was configured?
Can we reproduce it? - An Important concept for BCP that should be in
company policy is that All documentation should
be kept-up to date and properly protected
47Human Resources (799)
- What happens if our backup facility is 250 miles
away? How do we get people there? - What happens if the disaster was a natural
catastrophe and some important employees are
injured or worse what do we do now? - Executive Succession Planning what is this?
48End User Environment (800)
- How do we notify the users about a disaster and
the change of operating procedure? - Once there we need to have some type of people on
the ground directing issues pertaining to
employees. These people should be easily
identified. - We also need to be concerned on how to manage
other tasks that we might not have the resources
to do in the traditional manner. (example
automated data processing, or normal
communication methods) How do we handle that. The
BCP team needs to consider these types of issues.
49Data Backups (801)
- How do we ensure we have data to load back into
our new offsite systems? Data changes constantly.
We need a solution that makes sense and is cost
effective (this will vary business to business). - We will talk about traditional backup types as
well as electronic vaulting on the next few
slides.
50Traditional Backups (802)
- Traditional backups have some method of backing
up files to a removable medium. The first things
to understand about backups is the archive bit.
Every time a file is altered the archive bit is
set to notify the system that a file may need to
be backed up. Now lets talk about the 3 backup
types - Full
- Differential
- Incremental
51Full Backup (802)
- Simply put,
- backup every file on the system!
- Then clear the archive bit of each file
- This must be done to some degree of regularity,
depending on the business needs. - everything gets backed up
- if you do a full backup every day, you can
restore with only 1 restore operation - - Takes a long time, can be expensive to complete
in a timely manner
52Differential (802)
- Backup any file that has changed last full
backup. Steps are - Find any file where the archive bit is set
- Backup the file
- DO NOT clear the archive bit
- This allows you to quickly restore data in the
event of a disaster in 2 operations. Simply - Restore the last full backup
- Restore the last differential backup
- (more)
53Differential Pros/cons (802)
- Pros
- Faster than a full backup
- Can do a full restore with 2 operations restore
the last full backup, restore the last
differential backup - Cons
- Does not have all data on any tape, you still
need a full backup to do a complete restore
54Incremental (802)
- The idea is the backup any file that has changed
between the last full backup OR the last
incremental backup. Steps are - Find any file with the archive bit set
- Backup that file
- Clear the archive bit
- (more)
55Incremental Pros/Cons (802)
- Pros
- Fast to backup nightly
- Cons
- To restore requires many operations, restore last
full backup, restore every incremental backup
done since the last full restore. (restores are
slow) - If you lose any of the tapes (full or
incremental) you cannot truly restore all data.
56Which backup is right for you
- It depends on your needs.
- Personally I believe in the following strategy
- If you can do a full restore every night.. Do so
- If you cannot, then move to differential
- If you cannot handle differentials move to
incremental - REMEMBER, for all these to work you still need a
full backup periodically.
57Discussion of backups
- Can you mix differential and incremental backups?
(Why or Why not?) - All backups should be stored both onsite and
offsite (why) - When storing offsite, would the next building
over be appropriate? - There should be a clear written process on how to
restore files (why) - Someone should periodically test the backups by
performing restores to a test system (why)
58Discussion of Backups
- What situations would a full backup be
appropriate - What situations would a differential backup be
appropriate - What situations would an incremental backup be
appropriate
59Discussion of Backups
- When choosing an offsite storage facility think
of the following - How fast can I get access to my data
- What are the hours of the facility
- What are the access control protections the
facility provides (why do I care?) - Is there fire suppression systems
- Are there environmental controls
60Non Backup Terms that should be mentioned (804)
- Disk mirroring / shadowing coping data to one
or more hard drives such that a system has a
multiple copies of data in case of a drive
failure - Disk duplexing- same as shadowing, but using
multiple disk controllers.. (why?)
61Electronic Vaulting (804)
- Electronic Vaulting is the idea of sending all
changes to a file to a remote site (using
non-backup methods). This usually is not done
real-time but in batches. - (example bank transactions might be copied daily
to another office)
62Remote Journaling (805)
- RJ is another method of transmitting data to an
offsite facility. However it is different than
EJ. - It is done in real-time (What do I mean by
that) - Entire files are not copied, only changes
(deltas) to files. (also called transaction logs) - From the base files and the records of changes
you can recreate the current environment.
63Tape Vaulting (806)
- A type of backup, however rather than backing up
to a local device you back up to a remote
device.
64Phase 4 Restoration Strategies (809)
- Now that we covered recovery strategies we need
to look at a couple of recovery concepts that we
will need to understand in the planning stage.
65Phase 4 Restoration (809)
- When planning we must also recognize that there
are 3 different teams in DR. - Damage Assesment team assess the damage.
- Restoration team responsible for getting the
alternate site into a working functional
environment - Salvage team responsible for starting the
processes of recovering the original site and
moving from the backup site. (cannot stay in the
backup site forever ) - Lets look at these in the next slides
66Phase 4 Recovery (809)
- Damage Assesment
- Determine cause of disaster
- Determine potential for further damange
- Identify affected business functions and assets
- Indentify resources that must be replaced
immediately - Estimate how long it will take to bring ciritical
functions online - Determine whether the BCP should be put into
operation
67Phase 4 Recovery (809)
- Restoration Team should be responsible for
getting the alternate site into a working and
functioning environment
68Phase 4 Recovery (809)
- Salvage Team responsible for starting the
recovery of the original site. - When moving things back to the original site the
most critical functions should be moved LAST
(why) - The least critical functions should be moved
first.
69End of Phase 4 Recovery
70Phase 5 Plan design and development (814)
- Now we need to actually come up with a goals and
a plan for attaining these goals. These goals
must contain certain key information. - Responsibility who are the individuals
responsible for what. What is exptected of them,
how will they be trained - Authority in times of crisis who is in charge.
- Priorities What are the crictical processes,
what are the priorities. - Implementation and Testing how will we
implement our plans, how will we test it. - (more)
71Phase 5 Plan Design and Development (814)
- Strategies
- Copies of the plan need to be kept in one or more
lcoations. (why) - Plans must be in paper and electronic format
- Call tress should be implemented
72BCP Phase 6 Testing (816)
- OK so we have this great plan that weve spent
millions of hours and dollars creating.. But does
it work, or will it sink and completely fail
well we should try testing it. - Testing it also allows us to see where the plan
can be improved, or if new changes in environment
will require the plan to be updated (what company
doesnt change and grow?) - Testing should be carried out at LEAST once a
year. - Any problems that occurred should be documented
and reported to management. - So what are some testing methods?... Next slide
73Checklist Test (818)
- BCP is distributed to departments and functional
areas for review. The Managers read over and
indicate if anything is missing or should be
modified. (Manager checks off that the plan is
OK for their department)
74Structured Walk-Through (818)
- Representatives from each department come
together AS A GROUP, they walk through the plan
and different scenarios from beginning to end to
make sure nothing is left out.
75Simulation Test (819)
- A specific scenario is propose, all required
employees come together and start to simulate
that the event has happened and start taking
action to recover. The idea is to see if any
problems come up or if any concerns were left out.
76Parallel Test (819)
- Some systems are moved to the alternate site and
processing takes place. The results are compared
to the real processing to see if anything needs
to change.
77Full Interruption test (819)
- Most intrusive test.. The original site is
actually shutdown and processing is moved to the
alternate site (really needs to be a hot site).
The recovery team fulfils its obligation in
preparing the systems and environment for the
alternate site. - This is a full blown drill
- Requires tons of planning and co-ordination
- These are risky and can cause damage if not
managed properly. - Senior management approval is required due to the
risk involved.
78Maintaining the Plan (819)
- Now that we have the plan we need to maintain it!
Systems and processes become out of date and need
constant refresh why? - BCP plan may not be integrated into change
management process (it should be though!) - Infrastructure or environment changes (that never
changes ?) - Company re-organization, layoffs etc
- Changes in hardware or software
- Employee turn over
- (more)
79Maintaining the Plan (819)
- We can help keep the plan updated by taking the
following actions - Make BCP planning part of every business
decision! - Insert BCP maintenance responsibilities into job
descriptions - Include maintenance in personnel evaluations
- Perform internal audits that include DR and BCP
procedures - Test the plan yearly